Publications

Predicting Future Disease Activity and Treatment Responders for Multiple Sclerosis Patients Using a Bag-of-Lesions Brain Representation
Andrew Doyle
Douglas Arnold
Horizontal and Vertical Self-Adaptive Cloud Controller with Reward Optimization for Resource Allocation
Jesús Alejandro Cárdenes Cabré
Ricardo Sanz
Over-booking or under-booking of computing resources leads to higher cost and performance degradation of web applications. To optimize the p… (see more)erformance of web applications, access to the resources has to be dynamically controlled ensuring maximum cost-performance ratio of the application while fulfilling requirements. To simplify the design of dynamic cloud controllers, we propose a horizontal and vertical scalability self-aware agent defined by a self-adaptive fuzzy logic with an oriented random optimizer based on reward and memory. The algorithm dynamically adjusts the membership functions and their relationship, maximizing the reward of the system while considering the cost related to the deployment of new resources. The evaluation of the controller under real cloud workload reveals the ability of the algorithm to maximize the performance of the web application based on the target parameters given by an operator.
On integrating a language model into neural machine translation
Multi-way, multilingual neural machine translation
Baskaran Sankaran
F. Yarman-Vural
World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions
Teng Long
Jackie CK Cheung
Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading compr… (see more)ehension systems that can do the same. In this paper, we introduce a task and several models to drive progress towards this goal. In particular, we propose the task of rare entity prediction: given a web document with several entities removed, models are tasked with predicting the correct missing entities conditioned on the document context and the lexical resources. This task is challenging due to the diversity of language styles and the extremely large number of rare entities. We propose two recurrent neural network architectures which make use of external knowledge in the form of entity descriptions. Our experiments show that our hierarchical LSTM model performs significantly better at the rare entity prediction task than those that do not make use of external resources.
Twin Networks: Using the Future as a Regularizer
Nan Rosemary Ke
Christopher Pal
Being able to model long-term dependencies in sequential data, such as text, has been among the long-standing challenges of recurrent neural… (see more) networks (RNNs). This issue is strictly related to the absence of explicit planning in current RNN architectures. More explicitly, the RNNs are trained to predict only the next token given previous ones. In this paper, we introduce a simple way of encouraging the RNNs to plan for the future. In order to accomplish this, we introduce an additional neural network which is trained to generate the sequence in reverse order, and we require closeness between the states of the forward RNN and backward RNN that predict the same token. At each step, the states of the forward RNN are required to match the future information contained in the backward states. We hypothesize that the approach eases modeling of long-term dependencies thus helping in generating more globally consistent samples. The model trained with conditional generation for a speech recognition task achieved 12\% relative improvement (CER of 6.7 compared to a baseline of 7.6).
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition
Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the trai… (see more)ning speed and stability. In this paper, we introduce a new layer normalization technique called Dynamic Layer Normalization (DLN) for adaptive neural acoustic modeling in speech recognition. By dynamically generating the scaling and shifting parameters in layer normalization, DLN adapts neural acoustic models to the acoustic variability arising from various factors such as speakers, channel noises, and environments. Unlike other adaptive acoustic models, our proposed approach does not require additional adaptation data or speaker information such as i-vectors. Moreover, the model size is fixed as it dynamically generates adaptation parameters. We apply our proposed DLN to deep bidirectional LSTM acoustic models and evaluate them on two benchmark datasets for large vocabulary ASR experiments: WSJ and TED-LIUM release 2. The experimental results show that our DLN improves neural acoustic models in terms of transcription accuracy by dynamically adapting to various speakers and environments.
Improving Speech Recognition by Revising Gated Recurrent Units
Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neura… (see more)l Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates. This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU.
End-to-end optimization of goal-driven and visually grounded dialogue systems
Jérémie Mary
Bilal Piot
Olivier Pietquin
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architec… (see more)tures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.
Adversarial Generation of Natural Language
Sai Rajeswar
Christopher Pal
Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for … (see more)image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.
Approximate Value Iteration with Temporally Extended Actions (Extended Abstract)
Timothy A. Mann
Shie Mannor
The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrate… (see more)d the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state space. Next we consider generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmarkbased AVI (LAVI), that represents the value function only at landmark states. We analyze OFVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.
Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning
We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention. We develop a model that can plan… (see more) ahead when it computes alignments between the source and target sequences not only for a single time-step but for the next k time-steps as well by constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by strategic attentive reader and writer (STRAW) model, a recent neural architecture for planning with hierarchical reinforcement learning that can also learn higher level temporal abstractions. Our proposed model is end-to-end trainable with differentiable operations. We show that our model outperforms strong baselines on character-level translation task from WMT’15 with fewer parameters and computes alignments that are qualitatively intuitive.