Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
Iulian V. Serban
Tim Klinger
Gerald Tesauro
Kartik Talamadupula
Bowen Zhou
We introduce a new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at m… (voir plus)ultiple levels of abstraction. The models extend the sequence-to-sequence framework to generate two parallel stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language words (e.g. sentences). The coarse sequences follow a latent stochastic process with a factorial representation, which helps the models generalize to new examples. The coarse sequences can also incorporate task-specific knowledge, when available. In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. These procedures enable training the multiresolution recurrent neural networks by maximizing the exact joint log-likelihood over both sequences. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving state-of-the-art results according to both a human evaluation study and automatic evaluation metrics. Furthermore, experiments show the proposed models generate more fluent, relevant and goal-oriented responses.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus
Ryan Thomas Lowe
Nissan Pow
Iulian V. Serban
Chia-Wei Liu
In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogu… (voir plus)e Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines  in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance  conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu  Dialogue Corpus, and for end-to-end dialogue systems in general.
Adversarially Learned Inference
Vincent Dumoulin
Ishmael Belghazi
Ben Poole
Alex Lamb
Martin Arjovsky
Olivier Mastropietro
We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an ad… (voir plus)versarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks and a discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network. We illustrate the ability of the model to learn mutually coherent inference and generation networks through the inspections of model samples and reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on the semi-supervised SVHN and CIFAR10 tasks.
BOUNDS LEAD TO IMPROVED CLASSIFIERS
The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (voir plus)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.
Calibrating Energy-based Generative Adversarial Networks
Zihang Dai
Amjad Almahairi
Philip Bachman
Eduard Hovy
In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples. Specific… (voir plus)ally, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal. We derive the analytic form of the induced solution, and analyze the properties. In order to make the proposed framework trainable in practice, we introduce two effective approximation techniques. Empirically, the experiment results closely match our theoretical analysis, verifying the discriminator is able to recover the energy of data distribution.
Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures
M. Jorge Cardoso
Xiongbiao Luo
Stefan Wesarg
Tobias Reichl
M. Ballester
Jonathan Mcleod
Klaus Dr. Drechsler
T. Peters
Marius Erdt
Kensaku Mori
M. Linguraru
Andreas Uhl
Cristina Oyarzun Laura
R. Shekhar
Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures
M. Cardoso
Xiongbiao Luo
Stefan Wesarg
Tobias Reichl
M. Ballester
Jonathan Mcleod
Klaus Dr. Drechsler
T. Peters
Marius Erdt
Kensaku Mori
M. Linguraru
Andreas Uhl
Cristina Oyarzun Laura
R. Shekhar
Computer-Assisted Conceptual Analysis of Textual Data as Applied to Philosophical Corpuses
Jean Guy Meunier
L. Chartrand
Mathieu Valette
Marie-noëlle Bayle
Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support
M. Jorge Cardoso
G. Carneiro
T. Syeda-Mahmood
J. Tavares
Mehdi Moradi
Andrew P. Bradley
Hayit Greenspan
J. Papa
Anant. Madabhushi
Jacinto C Nascimento
Jaime S. Cardoso
Vasileios Belagiannis
Zhi Lu
Faculdade Engenharia
Diet Networks: Thin Parameters for Fat Genomics
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
Marie-Pierre Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
Diet Networks: Thin Parameters for Fat Genomics
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
Marie-Pierre Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.