Publications

Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (voir plus)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanisław Jastrzębski
Nicolas Ballas
Maxinder S. Kanwal
Asja Fischer
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (voir plus)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean F. Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (voir plus)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (voir plus)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and re… (voir plus)presentations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri
Deep Complex Networks
Chiheb Trabelsi
Dmitriy Serdyuk
Sandeep Subramanian
Joao Felipe Santos
Soroush Mehri