Nicolas Le Roux

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient metho… (voir plus)d as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions (similar to TRPO, PPO) and a critic that involves minimizing a closely connected objective. Using simple bandit examples, we provably establish the benefit of the proposed critic objective over the standard squared error. Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.

Joint Prompt Optimization of Stacked LLMs using Variational Inference

Eric Yuan

Xingdi Yuan

Marc-Alexandre Côté

Matheus Pereira

Adam Trischler

Ziang Xiao

Arian Hosseini

Friederike Niedtner

Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can b… (voir plus)e seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.

Multi-Head Adapter Routing for Cross-Task Generalization

Lucas Caccia

Edoardo Ponti

Zhan Su

Matheus Pereira

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before f… (voir plus)ew-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (

Target-based Surrogates for Stochastic Optimization

Jonathan Wilder Lavington

Sharan Vaswani

Reza Babanezhad Harikandeh

Mark Schmidt

We consider minimizing functions for which it is expensive to compute the gradient. Such functions are prevalent in reinforcement learning, … (voir plus)imitation learning and bilevel optimization. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the

2023-04-24

ICML.cc/2023/Conference (poster)

doi.org

Multi-Head Adapter Routing for Cross-Task Generalization

Lucas Caccia

Edoardo Ponti

Zhan Su

Matheus Pereira

2022-11-07

ArXiv (prépublication)

A general class of surrogate functions for stable and efficient reinforcement learning

Sharan Vaswani

Olivier Bachem

Simone Totaro

Robert Müller

Shivam Garg

Matthieu Geist

Marlos C. Machado

Pablo Samuel Castro

2022-01-01

AISTATS (publié)

proceedings.mlr.press

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Lucas Caccia

Edoardo Ponti

Lu Liu

Matheus Pereira

Parameter-efﬁcient ﬁne-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly add… (voir plus)ed parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon [Ponti et al., 2022] jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and ﬁne-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efﬁcient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation ( Poly - µ ) instead of learning a routing function. Moreover, we introduce more expressive variants where ﬁner-grained task–adapter allocation is learned through a multi-head routing function ( Poly - S ). We test these variants on three separate benchmarks for multi-task learning. We ﬁnd that Poly - S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we ﬁnd that instruction tuning, where models are fully ﬁne-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.

2022-01-01

arXiv.org (prépublication)

doi.org

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

Chris Junchi Li

Yaodong Yu

Nicolas Loizou

Gauthier Gidel

Yitong Ma

Michael I. Jordan

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) meth… (voir plus)od with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

2022-01-01

AISTATS (publié)

Impact of Aliasing on Generalization in Deep Convolutional Networks

Cristina Vasconcelos

Hugo Larochelle

Vincent Dumoulin

Rob Romijnders

Ross Goroshin

We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (voir plus)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

2021-10-10

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

doi.org

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

Wesley Chung

Valentin Thomas

Marlos C. Machado

2021-07-01

Proceedings of the 38th International Conference on Machine Learning (publié)

proceedings.mlr.press

Bridging the Gap Between Adversarial Robustness and Optimization Bias

Fartash Faghri

Cristina Vasconcelos

David J Fleet

Fabian Pedregosa

2021-02-17

ArXiv (prépublication)

Batch Reinforcement Learning Through Continuation Method

Yijie Guo

Shengyu Feng

Ed Chi

Honglak Lee

Minmin Chen

Many real-world applications of reinforcement learning (RL) require the agent to learn from a fixed set of trajectories, without collecting … (voir plus)new interactions. Policy optimization under this setting is extremely challenging as: 1) the geometry of the objective function is hard to optimize efficiently; 2) the shift of data distributions causes high noise in the value estimation. In this work, we propose a simple yet effective policy iteration approach to batch RL using global optimization techniques known as continuation. By constraining the difference between the learned policy and the behavior policy that generates the fixed trajectories, and continuously relaxing the constraint, our method 1) helps the agent escape local optima; 2) reduces the error in policy evaluation in the optimization procedure. We present results on a variety of control tasks, game environments, and a recommendation task to empirically demonstrate the efficacy of our proposed method.

2021-01-01

ICLR (publié)