Accelerated Stochastic Power Iteration
Peng Xu
Bryan Dawei He
Christopher De Sa
Christopher Re
Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.
Learning Visual Reasoning Without Strong Priors
Ethan Perez
Harm de Vries
Florian Strub
Vincent Dumoulin
Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an… (see more) important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.
Detecting Large Concept Extensions for Conceptual Analysis
L. Chartrand
Mohamed Bouguessa
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean F. Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Implementation of Sparse Superposition Codes
Carlo Condo
Sparse superposition codes (SSCs) are capacity achieving codes whose decoding process is a linear sensing problem. Decoding approaches thus … (see more)exploit the approximate message passing algorithm, which has been proven to be effective in compressing sensing. Previous work from the authors has evaluated the error correction performance of SSCs under finite precision and finite code length. This paper proposes the first SSC encoder and decoder architectures in the literature. The architectures are parametrized and applicable to all SSCs: A set of wide-ranging case studies is then considered, and code-specific approximations, along with implementation results in 65 nm CMOS technology, are then provided. The encoding process can be carried out with low power consumption (
Implementation of Sparse Superposition Codes
Carlo Condo
Sparse superposition codes (SSCs) are capacity achieving codes whose decoding process is a linear sensing problem. Decoding approaches thus … (see more)exploit the approximate message passing algorithm, which has been proven to be effective in compressing sensing. Previous work from the authors has evaluated the error correction performance of SSCs under finite precision and finite code length. This paper proposes the first SSC encoder and decoder architectures in the literature. The architectures are parametrized and applicable to all SSCs: A set of wide-ranging case studies is then considered, and code-specific approximations, along with implementation results in 65 nm CMOS technology, are then provided. The encoding process can be carried out with low power consumption (≤2.103 mW), while the semi-parallel decoder architecture can reach a throughput of 1.3 Gb/s with a 768 × 6-bit SSC codeword and an area occupation of 2.43 mm2.
Multi-modal Variational Encoder-Decoders
Iulian V. Serban
Alexander G. Ororbia II
Recent advances in neural variational inference have facilitated efficient training of powerful directed graphical models with continuous la… (see more)tent variables, such as variational autoencoders. However, these models usually assume simple, uni-modal priors — such as the multivariate Gaussian distribution — yet many real-world data distributions are highly complex and multi-modal. Examples of complex and multi-modal distributions range from topics in newswire text to conversational dialogue responses. When such latent variable models are applied to these domains, the restriction of the simple, uni-modal prior hinders the overall expressivity of the learned model as it cannot possibly capture more complex aspects of the data distribution. To overcome this critical restriction, we propose a flexible, simple prior distribution which can be learned efficiently and potentially capture an exponential number of modes of a target distribution. We develop the multi-modal variational encoder-decoder framework and investigate the effectiveness of the proposed prior in several natural language processing modeling tasks, including document modeling and dialogue modeling.
A Sparse Probabilistic Model of User Preference Data
Matthew J. A. Smith
Fast and Flexible Successive-Cancellation List Decoders for Polar Codes
Seyyed Ali Hashemi
Carlo Condo
Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next gen… (see more)eration of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable tradeoff between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the cost of limited throughput. The simplified SCL (SSCL) and its extension SSCL-SPC increase the speed of decoding by removing redundant calculations when encountering particular information and frozen bit patterns (rate one and single parity check codes), while keeping the error-correction performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving that the list size imposes a specific number of path splitting required to decode rate one and single parity check codes. Thus, the number of splitting can be limited while guaranteeing exactly the same error-correction performance as if the paths were forked at each bit estimation. We call the new decoding algorithms Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of path forks in a practical application can be tuned to achieve desirable speed, while keeping the error-correction performance almost unchanged. Hardware architectures implementing both algorithms are then described and implemented: It is shown that our design can achieve
Nifty Assignments
Nick Parlante
Julie Zelenski
Dave Feinberg
Kunal Mishra
Josh Hug
Kevin Wayne
Michael Guerzhoy
François Pitt
I suspect that students learn more from our programming assignments than from our much sweated-over lectures, with their slide transitions, … (see more)clip art, and joke attempts. A great assignment is deliberate about where the student hours go, concentrating the student's attention on material that is interesting and useful. The best assignments solve a problem that is topical and entertaining, providing motivation for the whole stack of work. Unfortunately, creating great programming assignments is both time consuming and error prone. The Nifty Assignments special session is all about promoting and sharing the ideas and ready-to-use materials of successful assignments.
Char2Wav: End-to-End Speech Synthesis
Jose Sotelo
Soroush Mehri
Kundan Kumar
Joao Felipe Santos
Kyle Kastner
We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder . The reader is an… (see more) encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocoder refers to a conditional extension of SampleRNN which generates raw waveform samples from intermediate representations. Unlike traditional models for speech synthesis, Char2Wav learns to produce audio directly from text