Ioannis Mitliagkas

No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Charles Guille-Escuret

Hiroki Naganuma

Kilian FATRAS

Understanding the optimization dynamics of neural networks is necessary for closing the gap between theory and practice. Stochastic first-or… (see more)der optimization algorithms are known to efficiently locate favorable minima in deep neural networks. This efficiency, however, contrasts with the non-convex and seemingly complex structure of neural loss landscapes. In this study, we delve into the fundamental geometric properties of sampled gradients along optimization paths. We focus on two key quantities, which appear in the restricted secant inequality and error bound. Both hold high significance for first-order optimization. Our analysis reveals that these quantities exhibit predictable, consistent behavior throughout training, despite the stochasticity induced by sampling minibatches. Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training. These observed properties are sufficiently expressive to theoretically guarantee linear convergence and prescribe learning rate schedules mirroring empirical practices. We conduct our experiments on image classification, semantic segmentation and language modeling across different batch sizes, network architectures, datasets, optimizers, and initialization seeds. We discuss the impact of each factor. Our work provides novel insights into the properties of neural network loss functions, and opens the door to theoretical frameworks more relevant to prevalent practice.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

Mehrnaz Mofakhami

Reza Bayat

Ioannis Mitliagkas

Joao Monteiro

Valentina Zantedeschi

2024-06-20

ICML.cc/2024/Workshop/ES-FoMo-II (poster)

openreview.net

Gradient descent induces alignment between weights and the pre-activation tangents for deep non-linear networks

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved p… (see more)roblems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we clarify the nature of this correlation and explain its emergence at early training times. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent kernel. We identify a centering of the NFA that isolates this alignment and is robust to initialization scale. We show that, through this centering, the speed of NFA development can be predicted analytically in terms of simple statistics of the inputs and labels.

2024-06-16

ICML.cc/2024/Workshop/HiLD (poster)

openreview.net

Gradient descent induces alignment between weights and the pre-activation tangents for deep non-linear networks

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved p… (see more)roblems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we clarify the nature of this correlation and explain its emergence at early training times. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent kernel. We identify a centering of the NFA that isolates this alignment and is robust to initialization scale. We show that, through this centering, the speed of NFA development can be predicted analytically in terms of simple statistics of the inputs and labels.

2024-06-16

ICML.cc/2024/Workshop/HiLD (poster)

openreview.net

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

Eleni Triantafillou

Peter Kairouz

Fabian Pedregosa

Jamie Hayes

Meghdad Kurmanji

Kairan Zhao

Vincent Dumoulin

Julio C. S. Jacques Junior

Ioannis Mitliagkas

Jun Wan

Lisheng Sun-Hosoya

Sergio Escalera

Gintare Karolina Dziugaite

Peter Triantafillou

Isabelle Guyon

We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and in… (see more)itiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.

2024-06-13

ArXiv (preprint)

doi.org

arxiv.org

Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

Eleni Triantafillou

Peter Kairouz

Fabian Pedregosa

Jamie Hayes

Meghdad Kurmanji

Kairan Zhao

Vincent Dumoulin

Julio C. S. Jacques Junior

Ioannis Mitliagkas

Jun Wan

Lisheng Sun-Hosoya

Sergio Escalera

Gintare Karolina Dziugaite

Peter Triantafillou

Isabelle Guyon

We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and in… (see more)itiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In this paper, we analyze top solutions and delve into discussions on benchmarking unlearning, which itself is a research problem. The evaluation methodology we developed for the competition measures forgetting quality according to a formal notion of unlearning, while incorporating model utility for a holistic evaluation. We analyze the effectiveness of different instantiations of this evaluation framework vis-a-vis the associated compute cost, and discuss implications for standardizing evaluation. We find that the ranking of leading methods remains stable under several variations of this framework, pointing to avenues for reducing the cost of evaluation. Overall, our findings indicate progress in unlearning, with top-performing competition entries surpassing existing algorithms under our evaluation framework. We analyze trade-offs made by different algorithms and strengths or weaknesses in terms of generalizability to new datasets, paving the way for advancing both benchmarking and algorithm development in this important area.

2024-06-13

ArXiv (preprint)

doi.org

arxiv.org

Smoothness-Adaptive Sharpness-Aware Minimization for Finding Flatter Minima

Hiroki Naganuma

Junhyung Lyle Kim

Anastasios Kyrillidis

Ioannis Mitliagkas

The sharpness-aware minimization (SAM) procedure recently gained increasing attention due to its favorable generalization ability to unseen … (see more)data. SAM aims to find flatter (local) minima, utilizing a minimax objective. An immediate challenge in the application of SAM is the adjustment of two pivotal step sizes, which significantly influence its effectiveness. We introduce a novel, straightforward approach for adjusting step sizes that adapts to the smoothness of the objective function, thereby reducing the necessity for manual tuning. This method, termed Smoothness-Adaptive SAM (SA-SAM), not only simplifies the optimization process but also promotes the method's inherent tendency to converge towards flatter minima, enhancing performance in specific models.

2024-03-05

ICLR.cc/2024/Workshop/PML4LRS (poster)

openreview.net

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

2024-02-07

ArXiv (preprint)

doi.org

arxiv.org

Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

Divyat Mahajan

Ioannis Mitliagkas

Brady Neal

Vasilis Syrgkanis

2024-01-16

ICLR.cc/2024/Conference (spotlight)

openreview.net

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

Divyat Mahajan

Ioannis Mitliagkas

Brady Neal

Vasilis Syrgkanis

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estima… (see more)tion under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

2024-01-01

ICLR (published)

doi.org

arxiv.org

Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize

Ryan D'Orazio

Nicolas Loizou

Issam Hadj Laradji

Ioannis Mitliagkas

We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex optimization. I… (see more)n relatively smooth convex optimization we provide new convergence guarantees for SMD with a constant stepsize. For smooth convex optimization we propose a new adaptive stepsize scheme --- the mirror stochastic Polyak stepsize (mSPS). Notably, our convergence results in both settings do not make bounded gradient assumptions or bounded variance assumptions, and we show convergence to a neighborhood that vanishes under interpolation. Consequently, these results correspond to the first convergence guarantees under interpolation for the exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et al. 2021) to mirror descent and remains both practical and efficient for modern machine learning applications while inheriting the benefits of mirror descent. We complement our results with experiments across various supervised learning tasks and different instances of SMD, demonstrating the effectiveness of mSPS.

2023-11-08

TMLR (accepted)

openreview.net

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

We tackle the problems of latent variables identification and "out-of-support'' image generation in representation learning. We show that bo… (see more)th are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Ioannis Mitliagkas

Biography

Current Students

Blog Posts

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Ioannis Mitliagkas

Biography

Current Students

Blog Posts

Publications