On the Learning Dynamics of Deep Neural Networks
Remi Tachet des Combes
Mohammad Pezeshki
Samira Shabanian
While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely… (see more) misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
CNN Prediction of Future Disease Activity for Multiple Sclerosis Patients from Baseline MRI and Lesion Labels
Nazanin Mohammadi Sepahvand
Tal Hassner
Douglas Arnold
3D U-Net for Brain Tumour Segmentation
Raghav Mehta
How to Exploit Weaknesses in Biomedical Challenge Design and Organization
Annika Reinke
Matthias Eisenmann
Sinan Onogur
Marko Stankovic
Patrick Scholz
Peter M. Full
Hrvoje Bogunovic
Bennett Landman
Oskar Maier
Bjoern Menze
Gregory C. Sharp
Korsuk Sirinukunwattana
Stefanie Speidel
F. V. D. Sommen
Guoyan Zheng
Henning Müller
Michal Kozubek
Andrew P. Bradley
Pierre Jannin … (see 2 more)
Annette Kopp-Schneider
Lena Maier-Hein
RS-Net: Regression-Segmentation 3D CNN for Synthesis of Full Resolution Missing Brain MRI in the Presence of Tumours
Raghav Mehta
Social-Affiliation Networks: Patterns and the SOAR Model
Dhivya Eswaran
Artur Dubrawski
Christos Faloutsos
Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks
Thomas Mesnard
Gaëtan Vignoud
João Sacramento
Walter Senn
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Titouan Parcollet
Ying Zhang
Mohamed Morchid
Chiheb Trabelsi
Georges Linarès
Renato De Mori
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it… (see more) easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.
Twin Regularization for online speech recognition
Dmitriy Serdyuk
Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challeng… (see more)ing than off-line ASR, since real-time/low-latency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions. A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employed look-ahead features. This paper explores a different approach, based on estimating the future rather than waiting for it. Our technique encourages the hidden representations of a unidirectional recurrent network to embed some useful information about the future. Inspired by a recently proposed technique called Twin Networks, we add a regularization term that forces forward hidden states to be as close as possible to cotemporal backward ones, computed by a "twin" neural network running backwards in time. The experiments, conducted on a number of datasets, recurrent architectures, input features, and acoustic conditions, have shown the effectiveness of this approach. One important advantage is that our method does not introduce any additional computation at test time if compared to standard unidirectional recurrent networks.
Structured deep Fisher pruning for efficient facial trait classification
Qing Tian
James J. Clark
Domain Knowledge Discovery Guided by Software Trace Links
Natawut Monaikul
Jane Cleland-Huang
Software-intensive projects are specified and modeled using domain terminology. Knowledge of the domain terminology is necessary for perform… (see more)ing many Software Engineering tasks such as impact analysis, compliance verification, and safety certification. However, discovering domain terminology and reasoning about their interrelationships for highly technical software and system engineering domains is a complex task which requires significant domain expertise and human effort. In this paper, we present a novel approach for leveraging trace links in software intensive systems to guide the process of mining facts that contain domain knowledge. The trace links which drive our mining process, define relationships between artifacts such as regulations and requirements and enable a guided search through high-yield combinations of domain terms. Our proof-of-concept evaluation shows that our approach aids in the discovery of domain facts even in highly complex technical domains. These domain facts can provide support for a variety of Software Engineering activities. As a use case, we demonstrate how the mined facts can facilitate the task of project Q&A.
The Deconfounded Recommender: A Causal Inference Approach to Recommendation
Yixin Wang
Dawen Liang
David Blei
The goal of a recommender system is to show its users items that they will like. In forming its prediction, the recommender system tries to … (see more)answer: "what would the rating be if we 'forced' the user to watch the movie?" This is a question about an intervention in the world, a causal question, and so traditional recommender systems are doing causal inference from observational data. This paper develops a causal inference approach to recommendation. Traditional recommenders are likely biased by unobserved confounders, variables that affect both the "treatment assignments" (which movies the users watch) and the "outcomes" (how they rate them). We develop the deconfounded recommender, a strategy to leverage classical recommendation models for causal predictions. The deconfounded recommender uses Poisson factorization on which movies users watched to infer latent confounders in the data; it then augments common recommendation models to correct for potential confounding bias. The deconfounded recommender improves recommendation and it enjoys stable performance against interventions on test sets.