Publications

Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors

Atif Belal

Akhil Meethal

Francisco Perdigon Romero

Marco Pedersoli

Eric Granger

Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment acro… (see more)ss source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform feature alignment in a class-agnostic manner. This is challenging since the objects have unique modality information due to variations in object appearance across domains. A recent prototype-based approach proposed a class-wise alignment, yet it suffers from error accumulation caused by noisy pseudo-labels that can negatively affect adaptation with imbalanced data. To overcome these limitations, we propose an attention-based class-conditioned alignment method for MSDA, designed to align instances of each object category across domains. In particular, an attention module combined with an adversarial domain classifier allows learning domain-invariant and class-specific instance representations. Experimental results on multiple benchmarking MSDA datasets indicate that our method outperforms state-of-the-art methods and exhibits robustness to class imbalance, achieved through a conceptually simple class-conditioning strategy. Our code is available at: https://github.com/imatif17/ACIA.

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

Continual Pre-training of MoEs: How robust is your router?

Benjamin Thérien

Charles-Étienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

Eugene Belilovsky

Irina Rish

2025-03-06

ArXiv (preprint)

Cross-Task Affinity Learning for Multitask Dense Scene Predictions

Dimitrios Sinodinos

Narges Armanfard

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

Cross-validation for training and testing co-occurrence network inference algorithms

Daniel Agyapong

Jeffrey Ryan Propster

Jane Marks

Toby Dylan Hocking

2025-03-06

BMC Bioinformatics (published)

GradTune: Last-layer Fine-tuning for Group Robustness Without Group Annotation

Patrik Joslin Kenfack

Ulrich Aivodji

Samira Ebrahimi Kahou

This work addresses the limitations of deep neural networks (DNNs) in generalizing beyond training data due to spurious correlations. Recent… (see more) research has demonstrated that models trained with empirical risk minimization learn both core and spurious features, often upweighting spurious ones in the final classification, which can frequently lead to poor performance on minority groups. Deep Feature Reweighting alleviates this issue by retraining the model's last classification layer using a group-balanced held-out validation set. However, relying on spurious feature labels during training or validation limits practical application, as spurious features are not always known or costly to annotate. Our preliminary experiments reveal that ERM-trained models exhibit higher gradient norms on minority group samples in the hold-out dataset. Leveraging these insights, we propose an alternative approach called GradTune, which fine-tunes the last classification layer using high-gradient norm samples. Our results on four well-established benchmarks demonstrate that the proposed method can achieve competitive performance compared to existing methods without requiring group labels during training or validation.

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

GradTune: Last-layer Fine-tuning for Group Robustness Without Group Annotation

Patrik Joslin Kenfack

Ulrich Aivodji

Samira Ebrahimi Kahou

This work addresses the limitations of deep neural networks (DNNs) in generalizing beyond training data due to spurious correlations. Recent… (see more) research has demonstrated that models trained with empirical risk minimization learn both core and spurious features, often upweighting spurious ones in the final classification, which can frequently lead to poor performance on minority groups. Deep Feature Reweighting alleviates this issue by retraining the model's last classification layer using a group-balanced held-out validation set. However, relying on spurious feature labels during training or validation limits practical application, as spurious features are not always known or costly to annotate. Our preliminary experiments reveal that ERM-trained models exhibit higher gradient norms on minority group samples in the hold-out dataset. Leveraging these insights, we propose an alternative approach called GradTune, which fine-tunes the last classification layer using high-gradient norm samples. Our results on four well-established benchmarks demonstrate that the proposed method can achieve competitive performance compared to existing methods without requiring group labels during training or validation.

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

Ali Karami

Thi Kieu Khanh Ho

Narges Armanfard

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

A Joint Space-Time Encoder for Geographic Time-Series Data

David Mickisch

Konstantin Klemmer

Mélisande Teng

David Rolnick

Many real-world processes are characterized by complex spatio-temporal dependencies, from climate dynamics to disease spread. Here, we intro… (see more)duce a new neural network architecture to model such dynamics at scale: the \emph{Space-Time Encoder}. Building on recent advances in \emph{location encoders}, models that take as inputs geographic coordinates, we develop a method that takes in geographic and temporal information simultaneously and learns smooth, continuous functions in both space and time. The inputs are first transformed using positional encoding functions and then fed into neural networks that allow the learning of complex functions. We implement a prototype of the \emph{Space-Time Encoder}, discuss the design choices of the novel temporal encoding, and demonstrate its utility in climate model emulation. We discuss the potential of the method across use cases, as well as promising avenues for further methodological innovation.

2025-03-06

ICLR.cc/2025/Workshop/MLMP (poster)

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

Luca Scimeca

Alexander Rubinstein

Damien Teney

Seong Joon Oh

Armand Mihai Nicolicioiu

Yoshua Bengio

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut lea… (see more)rning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

Mixed Patch Visible-Infrared Modality Agnostic Object Detection

Heitor Rapela Medeiros

David Latortue

Eric Granger

Marco Pedersoli

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive … (see more)task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing un-even recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa)from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman

Mohsin Hasan

Minsu Kim

Luca Scimeca

Marcin Sendera

Yoshua Bengio

Glen Berseth

Nikolay Malkin

Any well-behaved generative model over a variable …

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)