Juan Ramirez

Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise

Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are… (see more) typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

Feasible Learning

Juan Ramirez

Ignacio Hounie

Juan Elenter

Jose Gallego-Posada

Meraj Hashemizadeh

Alejandro Ribeiro

Simon Lacoste-Julien

We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bound… (see more)s the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance \emph{on every individual data point}. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

proceedings.mlr.press

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi

Juan Ramirez

Tianyue H. Zhang

Simon Lacoste-Julien

Jose Gallego-Posada

Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problem… (see more)s are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the

2024-07-22

International Conference on Machine Learning (Accept (Poster))

doi.org

proceedings.mlr.press

Balancing Act: Constraining Disparate Impact in Sparse Models

Jose Gallego-Posada

Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or … (see more)storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Omega: Optimistic EMA Gradients

Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training… (see more). Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.

2023-07-01

ICML.cc/2023/Workshop/LXAI_Regular_Deadline (oral)

doi.org

openreview.net

Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

Jose Gallego-Posada

The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models,… (see more) this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor, thus lacking direct control of the resulting model sparsity. In response, we adopt a constrained formulation: using the gate mechanism proposed by Louizos et al. (2018), we formulate a constrained optimization problem where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion. Experiments on CIFAR-{10, 100}, TinyImageNet, and ImageNet using WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal and demonstrate that we can reliably achieve pre-determined sparsity targets without compromising on predictive performance.

2022-10-30

NeurIPS.cc/2022/Conference (accept)

doi.org

openreview.net

Mila Ventures Launchpad

AI Policy Compass

AI Policy Fellowship Publications

Juan Ramirez

Publications

Mila Ventures Launchpad

AI Policy Compass

AI Policy Fellowship Publications

Popular keywords:

Juan Ramirez

Publications