Publications

Frank-Wolfe Splitting via Augmented Lagrangian Method
Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than mini… (voir plus)mizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate for polytopes.
Nonlinear Weighted Finite Automata
Weighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models. Given the recent succ… (voir plus)esses of nonlinear models in machine learning, it is natural to wonder whether extending WFA to the nonlinear setting would be beneficial. In this paper, we propose a novel model of neural network based nonlinear WFA model (NL-WFA) along with a learning algorithm. Our learning algorithm is inspired by the spectral learning algorithm for WFA and relies on a nonlinear decomposition of the so-called Hankel matrix, by means of an auto-encoder network. The expressive power of NL-WFA and the proposed learning algorithm are assessed on both synthetic and real world data, showing that NL-WFA can lead to smaller model sizes and infer complex grammatical structures from data.
Fisher Pruning of Deep Nets for Facial Trait Classification
Qing Tian
James J. Clark
Although deep nets have resulted in high accuracies for various visual tasks, their computational and space requirements are prohibitively h… (voir plus)igh for inclusion on devices without high-end GPUs. In this paper, we introduce a neuron/filter level pruning framework based on Fisher's LDA which leads to high accuracies for a wide array of facial trait classification tasks, while significantly reducing space/computational complexities. The approach is general and can be applied to convolutional, fully-connected, and module-based deep structures, in all cases leveraging the high decorrelation of neuron activations found in the pre-decision layer and cross-layer deconv dependency. Experimental results on binary and multi-category facial traits from the LFWA and Adience datasets illustrate the framework's comparable/better performance to state-of-the-art pruning approaches and compact structures (e.g. SqueezeNet, MobileNet). Ours successfully maintains comparable accuracies even after discarding most parameters (98%-99% for VGG-16, 82% for GoogLeNet) and with significant FLOP reductions (83% for VGG-16, 64% for GoogLeNet).
Task-specific Deep LDA pruning of neural networks
Qing Tian
James J. Clark
With deep learning's success, a limited number of popular deep nets have been widely adopted for various vision tasks. However, this usually… (voir plus) results in unnecessarily high complexities and possibly many features of low task utility. In this paper, we address this problem by introducing a task-dependent deep pruning framework based on Fisher's Linear Discriminant Analysis (LDA). The approach can be applied to convolutional, fully-connected, and module-based deep network structures, in all cases leveraging the high decorrelation of neuron motifs found in the pre-decision layer and cross-layer deconv dependency. Moreover, we examine our approach's potential in network architecture search for specific tasks and analyze the influence of our pruning on model robustness to noises and adversarial attacks. Experimental results on datasets of generic objects, as well as domain specific tasks (CIFAR100, Adience, and LFWA) illustrate our framework's superior performance over state-of-the-art pruning approaches and fixed compact nets (e.g. SqueezeNet, MobileNet). The proposed method successfully maintains comparable accuracies even after discarding most parameters (98%-99% for VGG16, up to 82% for the already compact InceptionNet) and with significant FLOP reductions (83% for VGG16, up to 64% for InceptionNet). Through pruning, we can also derive smaller, but more accurate and more robust models suitable for the task.
A polynomial algorithm for a continuous bilevel knapsack problem
Patrice Marcotte
Learning Anonymized Representations with Adversarial Neural Networks
Clément Feutry
P. Duhamel
Statistical methods protecting sensitive information or the identity of the data owner have become critical to ensure privacy of individuals… (voir plus) as well as of organizations. This paper investigates anonymization methods based on representation learning and deep neural networks, and motivated by novel information theoretical bounds. We introduce a novel training objective for simultaneously training a predictor over target variables of interest (the regular labels) while preventing an intermediate representation to be predictive of the private labels. The architecture is based on three sub-networks: one going from input to representation, one from representation to predicted regular labels, and one from representation to predicted private labels. The training procedure aims at learning representations that preserve the relevant part of the information (about regular labels) while dismissing information about the private labels which correspond to the identity of a person. We demonstrate the success of this approach for two distinct classification versus anonymization tasks (handwritten digits and sentiment analysis).
Existence of Nash Equilibria on Integer Programming Games
João Pedro Pedroso
Online Hyper-Parameter Optimization
Damien Vincent
Sylvain Gelly
Olivier Bousquet
Online variance-reducing optimization
Reza Babanezhad Harikandeh
Reza Babanezhad
Pierre-Antoine Manzagol
Combining intraoperative ultrasound brain shift correction and augmented reality visualizations: a pilot study of eight cases
Ian J. Gerard
Marta Kersten-Oertel
Simon Drouin
Jeffery A. Hall
Kevin Petrecca
Dante De Nigris
Daniel A. Di Giovanni
D. Louis Collins
Abstract. We present our work investigating the feasibility of combining intraoperative ultrasound for brain shift correction and augmented … (voir plus)reality (AR) visualization for intraoperative interpretation of patient-specific models in image-guided neurosurgery (IGNS) of brain tumors. We combine two imaging technologies for image-guided brain tumor neurosurgery. Throughout surgical interventions, AR was used to assess different surgical strategies using three-dimensional (3-D) patient-specific models of the patient’s cortex, vasculature, and lesion. Ultrasound imaging was acquired intraoperatively, and preoperative images and models were registered to the intraoperative data. The quality and reliability of the AR views were evaluated with both qualitative and quantitative metrics. A pilot study of eight patients demonstrates the feasible combination of these two technologies and their complementary features. In each case, the AR visualizations enabled the surgeon to accurately visualize the anatomy and pathology of interest for an extended period of the intervention. Inaccuracies associated with misregistration, brain shift, and AR were improved in all cases. These results demonstrate the potential of combining ultrasound-based registration with AR to become a useful tool for neurosurgeons to improve intraoperative patient-specific planning by improving the understanding of complex 3-D medical imaging data and prolonging the reliable use of IGNS.
Modular Networks for Validating Community Detection Algorithms
Justin J Fagnan
Afra Abnar
Osmar R Zaiane
How can we accurately compare different community detection algorithms? These algorithms cluster nodes in a given network, and their perform… (voir plus)ance is often validated on benchmark networks with explicit ground-truth communities. Given the lack of cluster labels in real-world networks, a model that generates realistic networks is required for accurate evaluation of these algorithm. In this paper, we present a simple, intuitive, and flexible benchmark generator to generate intrinsically modular networks for community validation. We show how the generated networks closely comply with the characteristics observed for real networks; whereas their characteristics could be directly controlled to match wide range of real world networks. We further show how common community detection algorithms rank differently when being evaluated on these benchmarks compared to current available alternatives.
Accelerated Stochastic Power Iteration
Peng Xu
Bryan Dawei He
Christopher De Sa
Christopher Re
Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (voir plus)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.