Publications

Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization

Chris Junchi Li

Angela Yuan

Quanquan Gu

Michael Jordan

We propose a new first-order optimization algorithm --- AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent---for separable convex… (voir plus)-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.

2023-01-01

ICML (publié)

openreview.net

NEURAL MANIFOLDS AND GRADIENT-BASED ADAPTATION IN NEURAL-INTERFACE TASKS

Alexandre Payeur

Amy L. Orsborn

Guillaume Lajoie

. Neural activity tends to reside on manifolds whose dimension is much lower than the dimension of the whole neural state space. Experiments… (voir plus) using brain-computer interfaces with microelectrode arrays implanted in the motor cortex of nonhuman primates tested the hypothesis that external perturbations should produce different adaptation strategies depending on how “aligned” the perturbation is with respect to a pre-existing intrinsic manifold. On the one hand, perturbations within the manifold (WM) evoked fast reassociations of existing patterns for rapid adaptation. On the other hand, perturbations outside the manifold (OM) triggered the slow emergence of new neural patterns underlying a much slower—and, without adequate training protocols, inconsistent or virtually impossible—adaptation. This suggests that the time scale and the overall difficulty of the brain to adapt depend fundamentally on the structure of neural activity. Here, we used a simplified static Gaussian model to show that gradient-descent learning could explain the differences between adaptation to WM and OM perturbations. For small learning rates, we found that the adaptation speeds were different but the model eventually adapted to both perturbations. Moreover, sufficiently large learning rates could entirely prohibit adaptation to OM perturbations while preserving adaptation to WM perturbations, in agreement with experiments. Adopting an incremental training protocol, as has been done in experiments, permitted a swift recovery of a full adaptation in the cases where OM perturbations were previously impossible to relearn. Finally, we also found that gradient descent was compatible with the reassociation mechanism on short adaptation time scales. Since gradient descent has many biologically plausible variants, our findings thus establish gradient-based learning as a plausible mechanism for adaptation under network-level constraints, with a central role for the learning rate.

2023-01-01

(publié)

www.semanticscholar.org

NEURAL MANIFOLDS AND GRADIENT-BASED ADAPTATION IN NEURAL-INTERFACE TASKS

Alexandre Payeur

Amy L. Orsborn

Guillaume Lajoie

. Neural activity tends to reside on manifolds whose dimension is much lower than the dimension of the whole neural state space. Experiments… (voir plus) using brain-computer interfaces with microelectrode arrays implanted in the motor cortex of nonhuman primates tested the hypothesis that external perturbations should produce different adaptation strategies depending on how “aligned” the perturbation is with respect to a pre-existing intrinsic manifold. On the one hand, perturbations within the manifold (WM) evoked fast reassociations of existing patterns for rapid adaptation. On the other hand, perturbations outside the manifold (OM) triggered the slow emergence of new neural patterns underlying a much slower—and, without adequate training protocols, inconsistent or virtually impossible—adaptation. This suggests that the time scale and the overall difficulty of the brain to adapt depend fundamentally on the structure of neural activity. Here, we used a simplified static Gaussian model to show that gradient-descent learning could explain the differences between adaptation to WM and OM perturbations. For small learning rates, we found that the adaptation speeds were different but the model eventually adapted to both perturbations. Moreover, sufficiently large learning rates could entirely prohibit adaptation to OM perturbations while preserving adaptation to WM perturbations, in agreement with experiments. Adopting an incremental training protocol, as has been done in experiments, permitted a swift recovery of a full adaptation in the cases where OM perturbations were previously impossible to relearn. Finally, we also found that gradient descent was compatible with the reassociation mechanism on short adaptation time scales. Since gradient descent has many biologically plausible variants, our findings thus establish gradient-based learning as a plausible mechanism for adaptation under network-level constraints, with a central role for the learning rate.

2023-01-01

(published)

www.semanticscholar.org

NEURAL NETWORK-BASED SOLVERS FOR PDES

M. Cameron

Ian G Goodfellow

Yoshua Bengio

(1) N (x; θ) = Ll+1 ○ σl ○Ll ○ σl−1 ○ . . . ○ σ1 ○L1. The symbol Lk denotes the k’s affine operator of the form Lk(x) = … (voir plus)Akx + bk, while σk denotes a nonlinear function called an activation function. The activation functions are chosen by the user. The matrices Ak and shift vectors (or bias vectors) bk are encoded into the argument θ: θ = {Ak, bk} l+1 k=1. The term training neural network means finding {Ak, bk} l+1 k=1 such that N (x; θ) satisfies certain conditions. These conditions are described by the loss function chosen by the user. For example, one might want the neural network to assume certain values fj at certain points xj , j = 1, . . . ,N . These points x are called the training data. In this case, a common choice of the loss function is the least squares error:

2023-01-01

(publié)

www.semanticscholar.org

Noisy Pairing and Partial Supervision for Stylized Opinion Summarization

Reinald Kim

Mirella Lapata. 2020

Un-611

David Scott Krueger

Emmanuel Bengio

Maxinder S. Kan-620

Tegan Maharaj

Asja Fischer

Aaron Courville

Somnath Basu

Roy Chowdhury

Chao Zhao

Tanya Goyal

Junyi Jiacheng Xu

Jessy Li

Ivor Wai-hung Tsang

James T. Kwok

Neil Houlsby

Andrei Giurgiu

Stanisław Jastrzębski … (voir 22 de plus)

Bruna Morrone

Quentin de Laroussilhe

Mona Gesmundo

Attariyan Sylvain

Gelly

Thomas Wolf

Lysandre Debut

Julien Victor Sanh

Clement Chaumond

Anthony Delangue

Pier-339 Moi

Tim ric Cistac

R´emi Rault

Morgan Louf

Funtow-900 Joe

Sam Davison

Patrick Shleifer

Von Platen

Clara Ma

Yacine Jernite

Julien Plu

Canwen Xu

Opinion summarization research has primar-001 ily focused on generating summaries reflect-002 ing important opinions from customer reviews 0… (voir plus)03 without paying much attention to the writing 004 style. In this paper, we propose the stylized 005 opinion summarization task, which aims to 006 generate a summary of customer reviews in 007 the desired (e.g., professional) writing style. 008 To tackle the difficulty in collecting customer 009 and professional review pairs, we develop a 010 non-parallel training framework, Noisy Pair-011 ing and Partial Supervision ( NAPA ), which 012 trains a stylized opinion summarization sys-013 tem from non-parallel customer and profes-014 sional review sets. We create a benchmark P RO - 015 S UM by collecting customer and professional 016 reviews from Yelp and Michelin. Experimental 017 results on P RO S UM and FewSum demonstrate 018 that our non-parallel training framework con-019 sistently improves both automatic and human 020 evaluations, successfully building a stylized 021 opinion summarization model that can gener-022 ate professionally-written summaries from cus-023 tomer reviews. 024

2023-01-01

(publié)

www.semanticscholar.org

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Mueller

Tiffany Joyce Vlaar

David Rolnick

Matthias Hein

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in va… (voir plus)rious settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.

openreview.net

A Novel Deep Multi-head Attentive Vulnerable Line Detector

Miles Q. Li

Benjamin Fung

Ashita Diwan

2023-01-01

Procedia Computer Science (publié)

doi.org

An Online Newton’s Method for Time-Varying Linear Equality Constraints

Jean-Luc Lupien

Antoine Lesage-Landry

We consider online optimization problems with time-varying linear equality constraints. In this framework, an agent makes sequential decisio… (voir plus)ns using only prior information. At every round, the agent suffers an environment-determined loss and must satisfy time-varying constraints. Both the loss functions and the constraints can be chosen adversarially. We propose the Online Projected Equality-constrained Newton Method (OPEN-M) to tackle this family of problems. We obtain sublinear dynamic regret and constraint violation bounds for OPEN-M under mild conditions. Namely, smoothness of the loss function and boundedness of the inverse Hessian at the optimum are required, but not convexity. Finally, we show OPEN-M outperforms state-of-the-art online constrained optimization algorithms in a numerical network flow application.

2023-01-01

IEEE Control Systems Letters (publié)

doi.org

arxiv.org

Optimising Electric Vehicle Charging Station Placement Using Advanced Discrete Choice Models

Steven Lamontagne

Margarida Carvalho

Emma Frejinger

Bernard Gendron

Miguel F. Anjos

Ribal Atallah

D'epartement d'informatique et de recherche op'erationnelle

U. Montr'eal

S. O. Mathematics

U. Edinburgh

Institut de Recherche d'Hydro-Qu'ebec

We present a new model for finding the optimal placement of electric vehicle charging stations across a multiperiod time frame so as to maxi… (voir plus)mise electric vehicle adoption. Via the use of stochastic discrete choice models and user classes, this work allows for a granular modelling of user attributes and their preferences in regard to charging station characteristics. We adopt a simulation approach and precompute error terms for each option available to users for a given number of scenarios. This results in a bilevel optimisation model that is, however, intractable for all but the simplest instances. Our major contribution is a reformulation into a maximum covering model, which uses the precomputed error terms to calculate the users covered by each charging station. This allows solutions to be found more efficiently than for the bilevel formulation. The maximum covering formulation remains intractable in some instances, so we propose rolling horizon, greedy, and greedy randomised adaptive search procedure heuristics to obtain good-quality solutions more efficiently. Extensive computational results are provided, and they compare the maximum covering formulation with the current state of the art for both exact solutions and the heuristic methods. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete. Funding: This work was supported by Hydro-Québec and the Natural Sciences and Engineering Research Council of Canada [Discovery grant 2017-06054; Collaborative Research and Development Grant CRDPJ 536757–19]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.0185 .

2023-01-01

INFORMS J. Comput. (publié)

doi.org

arxiv.org

Optimism and Adaptivity in Policy Optimization

Veronica Chelu

Tom Zahavy

Arthur Guez

Doina Precup

Sebastian Flennerhag

2023-01-01

arXiv.org (prépublication)

doi.org

Optimizing Fairness over Time with Homogeneous Workers (Short Paper).

Bart-jan Van Rossum

Rui Chen

Andrea Lodi

2023-01-01

ATMOS (publié)

doi.org

PAC-Bayesian Generalization Bounds for Adversarial Generative Models

Sokhna Diarra Mbacke

Florence Clerc

Pascal Germain

We extend PAC-Bayesian theory to generative models and develop generalization bounds for models based on the Wasserstein distance and the to… (voir plus)tal variation distance. Our first result on the Wasserstein distance assumes the instance space is bounded, while our second result takes advantage of dimensionality reduction. Our results naturally apply to Wasserstein GANs and Energy-Based GANs, and our bounds provide new training objectives for these two. Although our work is mainly theoretical, we perform numerical experiments showing non-vacuous generalization bounds for Wasserstein GANs on synthetic datasets.

2023-01-01

ICML (publié)

doi.org

openreview.net

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Publications

Le traitement du langage naturel à l'ère de l'IA générative

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications