Ioannis Mitliagkas

Adversarial target-invariant representation learning for domain generalization

Isabela Albuquerque

Joao Monteiro

Tiago Falk

In many applications of machine learning, the training and test set data come from different distributions, or domains. A number of domain g… (see more)eneralization strategies have been introduced with the goal of achieving good performance on out-of-distribution data. In this paper, we propose an adversarial approach to the problem. We propose a process that enforces pair-wise domain invariance while training a feature extractor over a diverse set of domains. We show that this process ensures invariance to any distribution that can be expressed as a mixture of the training domains. Following this insight, we then introduce an adversarial approach in which pair-wise divergences are estimated and minimized. Experiments on two domain generalization benchmarks for object recognition (i.e., PACS and VLCS) show that the proposed method yields higher average accuracy on the target domains in comparison to previously introduced adversarial strategies, as well as recently proposed methods based on learning invariant representations.

2019-11-03

arXiv.org (preprint)

dblp.uni-trier.de

Generalizing to unseen domains via distribution matching

Isabela Albuquerque

Joao Monteiro

Mohammad-Javad Darvishi-Bayazi

Tiago Falk

Ioannis Mitliagkas

Supervised learning results typically rely on assumptions of i.i.d. data. Unfortunately, those assumptions are commonly violated in practice… (see more). In this work, we tackle this problem by focusing on domain generalization: a formalization where the data generating process at test time may yield samples from never-before-seen domains (distributions). Our work relies on a simple lemma: by minimizing a notion of discrepancy between all pairs from a set of given domains, we also minimize the discrepancy between any pairs of mixtures of domains. Using this result, we derive a generalization bound for our setting. We then show that low risk over unseen domains can be achieved by representing the data in a space where (i) the training distributions are indistinguishable, and (ii) relevant information for the task at hand is preserved. Minimizing the terms in our bound yields an adversarial formulation which estimates and minimizes pairwise discrepancies. We validate our proposed strategy on standard domain generalization benchmarks, outperforming a number of recently introduced methods. Notably, we tackle a real-world application where the underlying data corresponds to multi-channel electroencephalography time series from different subjects, each considered as a distinct domain.

2019-11-03

ArXiv (preprint)

arxiv.org

Negative Momentum for Improved Game Dynamics

Gauthier Gidel

Reyhane Askari Hemmat

Mohammad Pezeshki

Gabriel Huang

Rémi LE PRIOL

Simon Lacoste-Julien

Ioannis Mitliagkas

Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (see more)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.

2019-04-11

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Multi-objective training of Generative Adversarial Networks with multiple discriminators

Isabela Albuquerque

Joao Monteiro

Thang Doan

Breandan Considine

T. Falk

Ioannis Mitliagkas

Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in c… (see more)ontrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction can be computed efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.

2019-01-01

ICML (published)

arxiv.org

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Nicolas Le Roux

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (see more)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

arxiv.org

Negative Momentum for Improved Game Dynamics

Gauthier Gidel

Reyhane Askari Hemmat

Mohammad Pezeshki

Gabriel Huang

Rémi LE PRIOL

Simon Lacoste-Julien

Ioannis Mitliagkas

Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (see more)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.

2018-07-12

ArXiv (preprint)

arxiv.org

Accelerated Stochastic Power Iteration

Peng Xu

Bryan Dawei He

Christopher De Sa

Ioannis Mitliagkas

Christopher Re

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

2018-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

Deep Learning @15 Petaflops/second: Semi-supervised pattern detection for 15 Terabytes of climate data

W. Collins

M. Wehner

M. Prabhat

Thorsten Kurth

Nadathur Satish

Ioannis Mitliagkas

Jian Zhang

Evan Racah

Md. Mostofa Ali Patwary

Narayanan Sundaram

Pradeep Dubey

2017-12-01

(published)

www.semanticscholar.org

Accelerated Stochastic Power Iteration

Peng Xu

Bryan Dawei He

Christopher De Sa

Ioannis Mitliagkas

Christopher Re

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, … (see more)requires O ( 1 / Δ ) full-data passes to recover the principal component of a matrix with eigen-gap Δ. Lanczos, a significantly more complex method, achieves an accelerated rate of O ( 1 / Δ ) passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja's iteration achieve the optimal sample complexity O ( σ 2 / Δ 2 ) . Unfortunately, they are fully sequential, and also require O ( σ 2 / Δ 2 ) iterations, far from the O ( 1 / Δ ) rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, O ( 1 / Δ ) . We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the "breaking-point variance" beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity O ( 1 / Δ ) . Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.

2017-07-10

ArXiv (preprint)

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Ioannis Mitliagkas

Biography

Current Students

Blog Posts

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Ioannis Mitliagkas

Biography

Current Students

Blog Posts

Publications