Publications

Learnable Explicit Density for Continuous Latent Space and Variational Inference

Chin-Wei Huang

Ahmed Touati

Laurent Dinh

Michal Drozdzal

Mohammad Havaei

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its correspon… (see more)ding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

2017-10-06

ArXiv (preprint)

arxiv.org

Neural Network Based Nonlinear Weighted Finite Automata

Tianyu Li

Guillaume Rabusseau

Doina Precup

Weighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models. Given the recent succ… (see more)esses of nonlinear models in machine learning, it is natural to wonder whether ex-tending WFA to the nonlinear setting would be beneficial. In this paper, we propose a novel model of neural network based nonlinearWFA model (NL-WFA) along with a learning algorithm. Our learning algorithm is inspired by the spectral learning algorithm for WFAand relies on a nonlinear decomposition of the so-called Hankel matrix, by means of an auto-encoder network. The expressive power of NL-WFA and the proposed learning algorithm are assessed on both synthetic and real-world data, showing that NL-WFA can lead to smaller model sizes and infer complex grammatical structures from data.

2017-09-13

ArXiv (preprint)

arxiv.org

Predicting Future Disease Activity and Treatment Responders for Multiple Sclerosis Patients Using a Bag-of-Lesions Brain Representation

Andrew Doyle

Doina Precup

Douglas Arnold

Tal Arbel

2017-09-04

Medical Image Computing and Computer Assisted Intervention − MICCAI 2017 (published)

doi.org

Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics

M. Cardoso

Tal Arbel

Enzo Ferrante

Xavier Pennec

Adrian Dalca

Sarah Parisot

S. Joshi

Nematollah Batmanghelich

Aristeidis Sotiras

Mads Lenstrup Nielsen

M. Sabuncu

Tom Fletcher

Li Shen

Stanley Durrleman

Stefan H. Sommer

2017-09-01

Lecture Notes in Computer Science (published)

doi.org

World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions

Teng Long

Emmanuel Bengio

Ryan Lowe

Jackie Cheung

Doina Precup

Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading compr… (see more)ehension systems that can do the same. In this paper, we introduce a task and several models to drive progress towards this goal. In particular, we propose the task of rare entity prediction: given a web document with several entities removed, models are tasked with predicting the correct missing entities conditioned on the document context and the lexical resources. This task is challenging due to the diversity of language styles and the extremely large number of rare entities. We propose two recurrent neural network architectures which make use of external knowledge in the form of entity descriptions. Our experiments show that our hierarchical LSTM model performs significantly better at the rare entity prediction task than those that do not make use of external resources.

2017-09-01

Conference on Empirical Methods in Natural Language Processing (published)

doi.org

Predicting Success in Goal-Driven Human-Human Dialogues

Michael Noseworthy

Jackie Cheung

Joelle Pineau

In goal-driven dialogue systems, success is often defined based on a structured definition of the goal. This requires that the dialogue syst… (see more)em be constrained to handle a specific class of goals and that there be a mechanism to measure success with respect to that goal. However, in many human-human dialogues the diversity of goals makes it infeasible to define success in such a way. To address this scenario, we consider the task of automatically predicting success in goal-driven human-human dialogues using only the information communicated between participants in the form of text. We build a dataset from stackoverflow.com which consists of exchanges between two users in the technical domain where ground-truth success labels are available. We then propose a turn-based hierarchical neural network model that can be used to predict success without requiring a structured goal definition. We show this model outperforms rule-based heuristics and other baselines as it is able to detect patterns over the course of a dialogue and capture notions such as gratitude.

2017-08-01

SIGDIAL Conference (published)

doi.org

Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support

M. Cardoso

Tal Arbel

G. Carneiro

T. Syeda-Mahmood

J. Tavares

Mehdi Moradi

Andrew P. Bradley

Hayit Greenspan

J. Papa

Anant. Madabhushi

Jacinto C Nascimento

Jaime S. Cardoso

Vasileios Belagiannis

Zhi Lu

Faculdade Engenharia

2017-07-19

ArXiv (preprint)

doi.org

arxiv.org

Accelerated Stochastic Power Iteration

Peng Xu

Bryan Dawei He

Christopher De Sa

Ioannis Mitliagkas

Christopher Re

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

2017-07-10

ArXiv (preprint)

arxiv.org

Detecting Large Concept Extensions for Conceptual Analysis

L. Chartrand

Jackie Cheung

Mohamed Bouguessa

2017-07-02

Machine Learning and Data Mining in Pattern Recognition (published)

doi.org

arxiv.org

Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling

Sean Lawlor

Michael Rabbat

Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.

2017-06-15

IEEE Transactions on Signal Processing (published)

doi.org

Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling

Sean F. Lawlor

Michael Rabbat

Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.

2017-06-15

IEEE Transactions on Signal Processing (published)

doi.org

Implementation of Sparse Superposition Codes

Carlo Condo

Warren Gross

Sparse superposition codes (SSCs) are capacity achieving codes whose decoding process is a linear sensing problem. Decoding approaches thus … (see more)exploit the approximate message passing algorithm, which has been proven to be effective in compressing sensing. Previous work from the authors has evaluated the error correction performance of SSCs under finite precision and finite code length. This paper proposes the first SSC encoder and decoder architectures in the literature. The architectures are parametrized and applicable to all SSCs: A set of wide-ranging case studies is then considered, and code-specific approximations, along with implementation results in 65 nm CMOS technology, are then provided. The encoding process can be carried out with low power consumption (

2017-05-01

IEEE Transactions on Signal Processing (published)

doi.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications