Alexandre Larouche

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-09-29

NeurIPS.cc/2025/Workshop/Reliable_ML (published)

doi.org

openreview.net

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss ob- jectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize pre- dictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model’s ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with signif- icant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation proto- cols — yielding 1, 440 training configurations and 7, 200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-09-29

NeurIPS.cc/2025/Workshop/Reliable_ML (published)

openreview.net

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-08-12

ArXiv (preprint)

doi.org

arxiv.org

Neural Bandits for Data Mining: Searching for Dangerous Polypharmacy

Alexandre Larouche

Audrey Durand

Richard Khoury

Caroline Sirois

2022-12-10

ArXiv (preprint)

doi.org