Publications

Leveraging Function Space Aggregation for Federated Learning at Scale

Nikita Dhawan

Nicole Elyse Mitchell

Zachary Charles

Zachary Garrett

The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model,… (voir plus) without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.

2024-02-11

TMLR (accepté)

doi.org

openreview.net

Metrics reloaded: recommendations for image analysis validation.

Lena Maier-Hein

Annika Reinke

Evangelia Christodoulou

Ben Glocker

PATRICK GODAU

Fabian Isensee

Jens Kleesiek

Michal Kozubek

Mauricio Reyes

MICHAEL A. RIEGLER

Manuel Wiesenfarth

Michael Baumgartner

Matthias Eisenmann

DOREEN HECKMANN-NÖTZEL

A. EMRE KAVUR

TIM RÄDSCH

Minu Dietlinde Tizabi

Laura Acion

Michela Antonelli

Tal Arbel … (voir 47 de plus)

Spyridon Bakas

Peter Bankhead

Allison Benis

M. Jorge Cardoso

Veronika Cheplygina

BETH A. CIMINI

Gary S. Collins

Keyvan Farahani

Bram van Ginneken

Daniel A. Hashimoto

Michael M. Hoffman

Merel Huisman

Pierre Jannin

CHARLES E. KAHN

Alexandros Karargyris

Alan Karthikesalingam

H. Kenngott

Annette Kopp-Schneider

Anna Kreshuk

Tahsin Kurc

Bennett Landman

GEERT LITJENS

Amin Madani

Klaus Maier-Hein

Anne L. Martel

Peter Mattson

Erik Meijering

Bjoern Menze

David Moher

Karel G.M. Moons

Henning Müller

Felix Nickel

Brennan Nichyporuk

Jens Petersen

Nasir Rajpoot

Nicola Rieke

Julio Saez-Rodriguez

Clarisa S'anchez Guti'errez

Shravya Shetty

M. Smeden

Carole H. Sudre

Ronald M. Summers

Abdel Aziz Taha

Sotirios A. Tsaftaris

Ben Van Calster

Gael Varoquaux

PAUL F. JÄGER

2024-02-11

Nature Methods (publié)

doi.org

arxiv.org

Model Collapse Demystified: The Case of Regression

Elvis Dopgima Dohmatob

Yunzhen Feng

Julia Kempe

In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereb… (voir plus)y as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.

2024-02-11

ArXiv (prépublication)

doi.org

arxiv.org

The impact of gender on pediatric surgical access and outcomes in Africa

Sacha Williams

Olivia Serhan

Jenny Wang

Christian Guindi,

Elena Guadagno

Maeve Trudeau

Emmanuel Ameh

Kokila Lakhoo

Dan Poenaru

Girls, whose care is often affected by barriers steeped in gender inequity, may be at higher risk of poor surgical outcomes. This study expl… (voir plus)ored the impact of gender on pediatric surgical care in Africa. Differences in access to care and clinical outcomes for boys and girls were examined for pediatric surgical conditions that do not differ by physiological sex. A systematic review of African pediatric surgical studies ensued, followed by a random effects meta-analysis, and risk of bias assessment. Of the 12281 records retrieved, 54 were selected for review. Most studies were retrospective (57.4%), single-site (94.4%), from Egypt, Nigeria, Ghana, or Ethiopia (55.6%), focussed on gastrointestinal conditions (63.0%), published in 2010 or sooner (85.1%), had study durations of 5 years or less (68.5%), and cohorts of less than 200 children (57.4%). Sixty percent reported the outcome of mortality. Meta-analysis odds ratios revealed surgery was performed 3.6 times more often on boys (95% CI: 2.6, 4.9); and mortality was 1.6 times greater for girls (95% CI: 1.3, 2.0). African girls appear to face gender inequities in pediatric surgical care. Findings will be further explored in a mixed-methods study. I Gender disparities in global surgical care have been documented in the African adult population. However gender specific differentials in surgical access and outcomes have yet to be documented for African pediatric populations. This study provides first-time evidence of gender inequity in pediatric surgical care in Africa.

2024-02-10

Pediatric surgery international (Print) (publié)

doi.org

The Leukemoid Reaction in Severe Alcoholic Hepatitis: A Case Report

Siva Reddy

Sachin Agrawal

Sunil Kumar

Sourya Acharya

2024-02-10

Cureus (publié)

doi.org

Deep Learning for Data-Driven Districting-and-Routing

Arthur Ferraz

Cheikh Ahmed

Quentin Cappart

Thibaut Vidal

2024-02-07

ArXiv (prépublication)

arxiv.org

In-Context Learning Can Re-learn Forbidden Tasks

Despite significant investment into safety training, large language models (LLMs) deployed in the real world still suffer from numerous vuln… (voir plus)erabilities. One perspective on LLM safety training is that it algorithmically forbids the model from answering toxic or harmful queries. To assess the effectiveness of safety training, in this work, we study forbidden tasks, i.e., tasks the model is designed to refuse to answer. Specifically, we investigate whether in-context learning (ICL) can be used to re-learn forbidden tasks despite the explicit fine-tuning of the model to refuse them. We first examine a toy example of refusing sentiment classification to demonstrate the problem. Then, we use ICL on a model fine-tuned to refuse to summarise made-up news articles. Finally, we investigate whether ICL can undo safety training, which could represent a major security risk. For the safety task, we look at Vicuna-7B, Starling-7B, and Llama2-7B. We show that the attack works out-of-the-box on Starling-7B and Vicuna-7B but fails on Llama2-7B. Finally, we propose an ICL attack that uses the chat template tokens like a prompt injection attack to achieve a better attack success rate on Vicuna-7B and Starling-7B. Trigger Warning: the appendix contains LLM-generated text with violence, suicide, and misinformation.

2024-02-07

ArXiv (prépublication)

doi.org

arxiv.org

When is Momentum Extragradient Optimal? A Polynomial-Based Analysis

Junhyung Lyle Kim

Gauthier Gidel

Anastasios Kyrillidis

Fabian Pedregosa

The extragradient method has gained popularity due to its robust convergence properties for differentiable games. Unlike single-objective op… (voir plus)timization, game dynamics involve complex interactions reflected by the eigenvalues of the game vector field's Jacobian scattered across the complex plane. This complexity can cause the simple gradient method to diverge, even for bilinear games, while the extragradient method achieves convergence. Building on the recently proven accelerated convergence of the momentum extragradient method for bilinear games \citep{azizian2020accelerating}, we use a polynomial-based analysis to identify three distinct scenarios where this method exhibits further accelerated convergence. These scenarios encompass situations where the eigenvalues reside on the (positive) real line, lie on the real line alongside complex conjugates, or exist solely as complex conjugates. Furthermore, we derive the hyperparameters for each scenario that achieve the fastest convergence rate.

2024-02-07

TMLR (accepté)

openreview.net

Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

Daniel Beaglehole

Ioannis Mitliagkas

Atish Agarwala

Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the … (voir plus)most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. We prove the derivative alignment occurs with high probability in specific high dimensional settings. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.

2024-02-06

ArXiv (prépublication)

doi.org

openreview.net

AICOM-MP: an AI-based Monkeypox Detector for Resource-Constrained Environments

Tianyi Yang

Tianze Yang

Andrew Liu

Na An

Jie Tang

Shaoshan Liu

Xue Liu

2024-02-05

Connection science (publié)

doi.org

arxiv.org

Polynomial Lawvere Logic

Giorgio Bacci

Radu Mardare

Prakash Panangaden

Gordon D. Plotkin

2024-02-04

ArXiv (prépublication)

doi.org

arxiv.org

Toward Human-AI Alignment in Large-Scale Multi-Player Games

Sugandha Sharma

Guy Davidson

Khimya Khetarpal

Anssi Kanervisto

Udit Arora

Katja Hofmann

Ida Momennejad

Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a … (voir plus)method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.

2024-02-04

ArXiv (prépublication)

doi.org

arxiv.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications