Publications

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterance… (voir plus)s in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.
Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus
Ryan Thomas Lowe
Nissan Pow
Iulian V. Serban
Chia-Wei Liu
In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogu… (voir plus)e Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines  in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance  conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu  Dialogue Corpus, and for end-to-end dialogue systems in general.
BOUNDS LEAD TO IMPROVED CLASSIFIERS
The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (voir plus)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.
Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures
M. Cardoso
Xiongbiao Luo
Stefan Wesarg
Tobias Reichl
M. Ballester
Jonathan Mcleod
Klaus Dr. Drechsler
T. Peters
Marius Erdt
Kensaku Mori
M. Linguraru
Andreas Uhl
Cristina Oyarzun Laura
R. Shekhar
Computer Assisted and Robotic Endoscopy and Clinical Image-Based Procedures
M. Jorge Cardoso
Xiongbiao Luo
Stefan Wesarg
Tobias Reichl
M. Ballester
Jonathan Mcleod
Klaus Dr. Drechsler
T. Peters
Marius Erdt
Kensaku Mori
M. Linguraru
Andreas Uhl
Cristina Oyarzun Laura
R. Shekhar
Computer-Assisted Conceptual Analysis of Textual Data as Applied to Philosophical Corpuses
Jean Guy Meunier
L. Chartrand
Mathieu Valette
Marie-noëlle Bayle
Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support
M. Jorge Cardoso
G. Carneiro
T. Syeda-Mahmood
J. Tavares
Mehdi Moradi
Andrew P. Bradley
Hayit Greenspan
J. Papa
Anant. Madabhushi
Jacinto C Nascimento
Jaime S. Cardoso
Vasileios Belagiannis
Zhi Lu
Faculdade Engenharia
Diet Networks: Thin Parameters for Fat Genomics
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
Marc-André Legault
Marie-Pierre Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
Diet Networks: Thin Parameters for Fat Genomics
pierre luc carrier
Akram Erraqabi
Tristan Sylvain
Alex Auvolat
Etienne Dejoie
Marc-André Legault
Marie-Pierre Dubé
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (voir plus) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
Fetal, Infant and Ophthalmic Medical Image Analysis
M. Jorge Cardoso
Andrew Melbourne
Hrvoje Bogunovic
Pim Moeskops
Xinjian Chen
Ernst Schwartz
M. Garvin
E. Robinson
E. Trucco
Michael Ebner
Yanwu Xu
Antonios Makropoulos
Adrien Desjardin
Tom Kamiel Magda Vercauteren
Fetal, Infant and Ophthalmic Medical Image Analysis
M. Cardoso
Andrew Melbourne
Hrvoje Bogunovic
Pim Moeskops
Xinjian Chen
Ernst Schwartz
M. Garvin
E. Robinson
E. Trucco
Michael Ebner
Yanwu Xu
Antonios Makropoulos
Adrien Desjardin
Tom Kamiel Magda Vercauteren
Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics
M. Jorge Cardoso
Enzo Ferrante
Xavier Pennec
Adrian Dalca
Sarah Parisot
S. Joshi
Nematollah Batmanghelich
Aristeidis Sotiras
Mads Lenstrup Nielsen
Mert R. Sabuncu
Tom Fletcher
Li Shen
Stanley Durrleman
Stefan H. Sommer