Unimodal Probability Distributions for Deep Ordinal Classification
Christopher Beckham
Probability distributions produced by the cross-entropy loss for ordinal classification problems can possess undesired properties. We propos… (see more)e a straightforward technique to constrain discrete ordinal probability distributions to be unimodal via a combination of the Poisson probability mass function and the softmax nonlinearity. We evaluate this approach on two large ordinal image datasets and obtain promising results.
Accelerated Stochastic Power Iteration
Peng Xu
Bryan Dawei He
Christopher De Sa
Christopher Re
Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.
Learning Visual Reasoning Without Strong Priors
Ethan Perez
Harm de Vries
Florian Strub
Vincent Dumoulin
Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an… (see more) important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.
Detecting Large Concept Extensions for Conceptual Analysis
L. Chartrand
Mohamed Bouguessa
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean F. Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Olexa Bilaniuk
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri