Portrait de Ioannis Mitliagkas

Ioannis Mitliagkas

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheur scientifique, Google DeepMind
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Modèles génératifs
Optimisation
Systèmes distribués
Systèmes dynamiques
Théorie de l'apprentissage automatique

Biographie

Ioannis Mitliagkas est un professeur associé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Je suis également membre de Mila – Institut québécois d’intelligence artificielle et titulaire d’une chaire en IA Canada-CIFAR. J'occupe aussi un poste de chercheur scientifique à temps partiel chez Google DeepMind à Montréal.

Auparavant, j'ai été chercheur postdoctoral aux départements de statistique et d'informatique de l'Université de Stanford; j'ai obtenu mon doctorat à l'Université du Texas à Austin, au Département d'ingénierie électrique et informatique. Mes recherches portent sur l'apprentissage statistique et l'inférence, et plus particulièrement sur l'optimisation, les algorithmes efficaces distribués et à grande échelle, la théorie de l'apprentissage statistique et les méthodes MCMC. Mes travaux récents s’intéressent notamment aux méthodes d'optimisation efficace et adaptative, à l'étude de l'interaction entre l'optimisation et la dynamique des systèmes d'apprentissage à grande échelle, et à la dynamique des jeux.

Étudiants actuels

Stagiaire de recherche - UdeM
Stagiaire de recherche - UdeM
Co-superviseur⋅e :
Postdoctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM

Publications

LEAD: Min-Max Optimization from a Physical Perspective
Reyhane Askari Hemmat
Amartya Mitra
Adversarial formulations have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the ro… (voir plus)tational dynamics that hinder their convergence. In this paper, we show that game optimization shares dynamic properties with particle systems subject to multiple forces, and one can leverage tools from physics to improve optimization dynamics. Inspired by the physical framework, we propose LEAD, an optimizer for min-max games. Next, using Lyapunov stability theory from dynamical systems as well as spectral analysis, we study LEAD’s convergence properties in continuous and discrete time settings for a class of quadratic min-max games to demonstrate linear convergence to the Nash equilibrium. Finally, we empirically evaluate our method on synthetic setups and CIFAR-10 image generation to demonstrate improvements in GAN training.
Performative Prediction with Neural Networks
Performative Prediction with Neural Networks
Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers t… (voir plus)hat are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Sébastien Lachapelle
Tristan Deleu
Divyat Mahajan
Quentin Bertrand
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (voir plus)is limited. In this work, we provide evidence that disentangled representations coupled with sparse task-specific predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Sébastien Lachapelle
Tristan Deleu
Divyat Mahajan
Quentin Bertrand
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (voir plus)is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.
CADet: Fully Self-Supervised Anomaly Detection With Contrastive Learning
Charles Guille-Escuret
Pau Rodriguez
David Vazquez
Joao Monteiro
Towards efficient representation identification in supervised learning
Kartik Ahuja
Divyat Mahajan
Vasilis Syrgkanis
Humans have a remarkable ability to disentangle complex sensory inputs (e.g., image, text) into simple factors of variation (e.g., shape, co… (voir plus)lor) without much supervision. This ability has inspired many works that attempt to solve the following question: how do we invert the data generation process to extract those factors with minimal or no supervision? Several works in the literature on non-linear independent component analysis have established this negative result; without some knowledge of the data generation process or appropriate inductive biases, it is impossible to perform this inversion. In recent years, a lot of progress has been made on disentanglement under structural assumptions, e.g., when we have access to auxiliary information that makes the factors of variation conditionally independent. However, existing work requires a lot of auxiliary information, e.g., in supervised classification, it prescribes that the number of label classes should be at least equal to the total dimension of all factors of variation. In this work, we depart from these assumptions and ask: a) How can we get disentanglement when the auxiliary information does not provide conditional independence over the factors of variation? b) Can we reduce the amount of auxiliary information required for disentanglement? For a class of models where auxiliary information does not ensure conditional independence, we show theoretically and experimentally that disentanglement (to a large extent) is possible even when the auxiliary information dimension is much less than the dimension of the true latent representation.
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Samuel Sokota
Ryan D’orazio
J. Z. Kolter
Nicolas Loizou
Marc Lanctot
Noam Brown
Christian Kroer
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gra… (voir plus)dient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound
Charles Guille-Escuret
Baptiste Goujaud
Adam Ibrahim
The study of first-order optimization is sensitive to the assumptions made on the objective functions. These assumptions induce complexity c… (voir plus)lasses which play a key role in worst-case analysis, including the fundamental concept of algorithm optimality. Recent work argues that strong convexity and smoothness—popular assumptions in literature—lead to a pathological definition of the condition number. Motivated by this result, we focus on the class of functions satisfying a lower restricted secant inequality and an upper error bound. On top of being robust to the aforementioned pathological behavior and including some non-convex functions, this pair of conditions displays interesting geometrical properties. In particular, the necessary and sufficient conditions to interpolate a set of points and their gradients within the class can be separated into simple conditions on each sampled gradient. This allows the performance estimation problem (PEP) to be solved analytically, leading to a lower bound on the convergence rate that proves gradient descent to be exactly optimal on this class of functions among all first-order algorithms.
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (voir plus)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.
Gotta Go Fast When Generating Data with Score-Based Models
Alexia Jolicoeur-Martineau
Ke Li
Rémi Piché-Taillefer
Tal Kachman
Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These ap… (voir plus)proaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers. In this work, we aim to accelerate this process by devising a more efficient SDE solver. Existing approaches rely on the Euler-Maruyama (EM) solver, which uses a fixed step size. We found that naively replacing it with other SDE solvers fares poorly - they either result in low-quality samples or become slower than EM. To get around this issue, we carefully devise an SDE solver with adaptive step sizes tailored to score-based generative models piece by piece. Our solver requires only two score function evaluations, rarely rejects samples, and leads to high-quality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For high-resolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning.
A Study of Condition Numbers for First-Order Optimization
Charles Guille-Escuret
Baptiste Goujaud
Manuela Girotti