Marco Pedersoli

High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization

Masih Aminbeidokhti

Heitor Rapela Medeiros

Srikanth Muralidharan

Eric Granger

Marco Pedersoli

2025-10-08

ArXiv (prépublication)

arxiv.org

High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization

Masih Aminbeidokhti

Heitor Rapela Medeiros

Srikanth Muralidharan

Eric Granger

Marco Pedersoli

Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shi… (voir plus)fts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.

2025-10-08

ArXiv (prépublication)

arxiv.org

High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization

Masih Aminbeidokhti

Heitor Rapela Medeiros

Eric Granger

Marco Pedersoli

Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shi… (voir plus)fts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.

2025-10-08

ArXiv (prépublication)

arxiv.org

Revisiting Mixout: An Overlooked Path to Robust Finetuning

Masih Aminbeidokhti

Heitor Rapela Medeiros

Eric Granger

Marco Pedersoli

Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revis… (voir plus)it Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.

2025-10-08

ArXiv (prépublication)

arxiv.org

Revisiting Mixout: An Overlooked Path to Robust Finetuning

Masih Aminbeidokhti

Heitor Rapela Medeiros

Eric Granger

Marco Pedersoli

Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revis… (voir plus)it Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.

2025-10-08

ArXiv (prépublication)

arxiv.org

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

Atif Belal

Heitor Rapela Medeiros

Marco Pedersoli

Eric Granger

Vision-language object detectors (VLODs) such as YOLO-World and Grounding DINO achieve impressive zero-shot recognition by aligning region p… (voir plus)roposals with text representations. However, their performance often degrades under domain shift. We introduce VLOD-TTA, a test-time adaptation (TTA) framework for VLODs that leverages dense proposal overlap and image-conditioned prompt scores. First, an IoU-weighted entropy objective is proposed that concentrates adaptation on spatially coherent proposal clusters and reduces confirmation bias from isolated boxes. Second, image-conditioned prompt selection is introduced, which ranks prompts by image-level compatibility and fuses the most informative prompts with the detector logits. Our benchmarking across diverse distribution shifts -- including stylized domains, driving scenes, low-light conditions, and common corruptions -- shows the effectiveness of our method on two state-of-the-art VLODs, YOLO-World and Grounding DINO, with consistent improvements over the zero-shot and TTA baselines. Code : https://github.com/imatif17/VLOD-TTA

2025-10-01

ArXiv (prépublication)

arxiv.org

VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors

Atif Belal

Heitor Rapela Medeiros

Marco Pedersoli

Eric Granger

2025-10-01

ArXiv (prépublication)

arxiv.org

AInstein: Can AI Rediscover Scientific Concepts from First Principles?

Shambhavi Mishra

Jose Dolz

Large language models have demonstrated remarkable capabilities across diverse tasks, yet a fundamental question remains: can these models g… (voir plus)enuinely rediscover complex scientific insights, or do they merely recite memorized information? We present AInstein, a novel framework for evaluating whether language models can derive established scientific concepts from first principles when stripped of domain-specific terminology. Rather than testing the recall of scientific facts, we reformulate landmark discoveries as conceptual puzzles, challenging models to reconstruct the underlying technical solutions independently.

2025-09-22

NeurIPS.cc/2025/Workshop/WiML (publié)

openreview.net

MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training

Muhammad Osama Zeeshan

Natacha Gillet

Alessandro Lameiras Koerich

Marco Pedersoli

Francois Bremond

Eric Granger

Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of exp… (voir plus)ressions with considerable interpersonal variability. Subject-specific ER can benefit significantly from multi-source domain adaptation (MSDA) methods, where each domain corresponds to a specific subject, to improve model accuracy and robustness. Despite promising results, state-of-the-art MSDA approaches often overlook multimodal information or blend sources into a single domain, limiting subject diversity and failing to explicitly capture unique subject-specific characteristics. To address these limitations, we introduce MuSACo, a multi-modal subject-specific selection and adaptation method for ER based on co-training. It leverages complementary information across multiple modalities and multiple source domains for subject-specific adaptation. This makes MuSACo particularly relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain, where subject-level nuances are crucial. MuSACo selects source subjects relevant to the target and generates pseudo-labels using the dominant modality for class-aware learning, in conjunction with a class-agnostic loss to learn from less confident target samples. Finally, source features from each modality are aligned, while only confident target features are combined. Our experimental results on challenging multimodal ER datasets: BioVid and StressID, show that MuSACo can outperform UDA (blending) and state-of-the-art MSDA methods.

2025-08-17

ArXiv (prépublication)

doi.org

arxiv.org

Low-Rank Expert Merging for Multi-Source Domain Adaptation in Person Re-Identification

Taha Mustapha Nehdi

Nairouz Mrabah

Atif Belal

Marco Pedersoli

Eric Granger

Adapting person re-identification (reID) models to new target environments remains a challenging problem that is typically addressed using u… (voir plus)nsupervised domain adaptation (UDA) methods. Recent works show that when labeled data originates from several distinct sources (e.g., datasets and cameras), considering each source separately and applying multi-source domain adaptation (MSDA) typically yields higher accuracy and robustness compared to blending the sources and performing conventional UDA. However, state-of-the-art MSDA methods learn domain-specific backbone models or require access to source domain data during adaptation, resulting in significant growth in training parameters and computational cost. In this paper, a Source-free Adaptive Gated Experts (SAGE-reID) method is introduced for person reID. Our SAGE-reID is a cost-effective, source-free MSDA method that first trains individual source-specific low-rank adapters (LoRA) through source-free UDA. Next, a lightweight gating network is introduced and trained to dynamically assign optimal merging weights for fusion of LoRA experts, enabling effective cross-domain knowledge transfer. While the number of backbone parameters remains constant across source domains, LoRA experts scale linearly but remain negligible in size (= 2% of the backbone), reducing both the memory consumption and risk of overfitting. Extensive experiments conducted on three challenging b

2025-08-09

ArXiv (prépublication)

arxiv.org

Infrared Object Detection with Ultra Small ConvNets: Is ImageNet Pretraining Still Useful?

Srikanth Muralidharan

Heitor Rapela Medeiros

Masih Aminbeidokhti

Eric Granger

Marco Pedersoli

Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same … (voir plus)time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with

2025-08-04

ArXiv (prépublication)

arxiv.org

Infrared Object Detection with Ultra Small ConvNets: Is ImageNet Pretraining Still Useful?

Srikanth Muralidharan

Heitor Rapela Medeiros

Masih Aminbeidokhti

Eric Granger

Marco Pedersoli

Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same … (voir plus)time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with

2025-08-04

ArXiv (prépublication)

doi.org

arxiv.org

Conférence sur les politiques de l'IA de Mila

À l’avant-garde d’une nouvelle ère

TRAIL : IA responsable pour les professionnels et les leaders

Marco Pedersoli

Biographie

Publications

Conférence sur les politiques de l'IA de Mila

À l’avant-garde d’une nouvelle ère

TRAIL : IA responsable pour les professionnels et les leaders

Mots-clés populaires:

Marco Pedersoli

Biographie

Publications