Publications

Mixed Patch Visible-Infrared Modality Agnostic Object Detection

Heitor Rapela Medeiros

David Latortue

Eric Granger

Marco Pedersoli

In real-world scenarios, using multiple modalities like visible (RGB) and infrared (IR) can greatly improve the performance of a predictive … (see more)task such as object detection (OD). Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance. In this paper, we tackle a different way to employ RGB and IR modalities, where only one modality or the other is observed by a single shared vision encoder. This realistic setting requires a lower memory footprint and is more suitable for applications such as autonomous driving and surveillance, which commonly rely on RGB and IR data. However, when learning a single encoder on multiple modalities, one modality can dominate the other, producing un-even recognition results. This work investigates how to efficiently leverage RGB and IR modalities to train a common transformer-based OD vision encoder while countering the effects of modality imbalance. For this, we introduce a novel training technique to Mix Patches (MiPa)from the two modalities, in conjunction with a patch-wise modality agnostic module, for learning a common representation of both modalities. Our experiments show that MiPa can learn a representation to reach competitive results on traditional RGB/IR benchmarks while only requiring a single modality during inference. Our code is available at: https://github.com/heitorrapela/MiPa.

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

A Realistic Protocol for Evaluation of Weakly Supervised Object Localization

Shakeeb Murtaza

Soufiane Belharbi

Marco Pedersoli

Eric Granger

Weakly Supervised Object Localization (WSOL) allows training deep learning models for classification and localization (LOC) using only globa… (see more)l class-level labels. The absence of bounding box (bbox) supervision during training raises challenges in the literature for hyper-parameter tuning, model selection, and evaluation. WSOL methods rely on a validation set with bbox annotations for model selection, and a test set with bbox annotations for threshold estimation for producing bboxes from localization maps. This approach, however, is not aligned with the WSOL setting as these annotations are typically unavailable in real-world scenarios. Our initial empirical analysis shows a significant decline in LOC performance when model selection and threshold estimation rely solely on class labels and the image itself, respectively, compared to using manual bbox annotations. This highlights the importance of incorporating bbox labels for optimal model performance. In this paper, a new WSOL evaluation protocol is proposed that provides LOC information without the need for manual bbox annotations. In particular, we generated noisy pseudo-boxes from a pretrained off-the-shelf region proposal method such as Selective Search, CLIP, and RPN for model selection. These bboxes are also employed to estimate the threshold from LOC maps, circumventing the need for test-set bbox annotations. Our experiments with several WSOL methods on ILSVRC and CUB datasets show that using the proposed pseudo-bboxes for validation facilitates the model selection and threshold estimation, with LOC performance comparable to those selected using GT bboxes on the validation set and threshold estimation on the test set. It also outperforms models selected using class-level labels, and then dynamically thresholded based solely on LOC maps.

2025-03-06

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

SafeArena: Evaluating the Safety of Autonomous Web Agents

Ada Defne Tur

Esin DURMUS

Karolina Sta'nczak

2025-03-06

ArXiv (preprint)

SafeArena: Evaluating the Safety of Autonomous Web Agents

Ada Defne Tur

Esin DURMUS

Karolina Stanczak

LLM-based agents are becoming increasingly proficient at solving web-based tasks. With this capability comes a greater risk of misuse for ma… (see more)licious purposes, such as posting misinformation in an online forum or selling illicit substances on a website. To evaluate these risks, we propose SafeArena, the first benchmark to focus on the deliberate misuse of web agents. SafeArena comprises 250 safe and 250 harmful tasks across four websites. We classify the harmful tasks into five harm categories -- misinformation, illegal activity, harassment, cybercrime, and social bias, designed to assess realistic misuses of web agents. We evaluate leading LLM-based web agents, including GPT-4o, Claude-3.5 Sonnet, Qwen-2-VL 72B, and Llama-3.2 90B, on our benchmark. To systematically assess their susceptibility to harmful tasks, we introduce the Agent Risk Assessment framework that categorizes agent behavior across four risk levels. We find agents are surprisingly compliant with malicious requests, with GPT-4o and Qwen-2 completing 34.7% and 27.3% of harmful requests, respectively. Our findings highlight the urgent need for safety alignment procedures for web agents. Our benchmark is available here: https://safearena.github.io

2025-03-06

ArXiv (preprint)

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Yoshua Bengio

Luca Scimeca

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

Laurence Perreault-Levasseur

Solving Bayesian inverse problems with diffusion priors and off-policy RL

Moksh J. Jain

Yoshua Bengio

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

Towards personalized healthcare without harm via bias modulation

Frank Ngaha

Patrik Joslin Kenfack

Ulrich Aivodji

Samira Ebrahimi Kahou

Personalized machine learning models have gained significant importance in various domains, including healthcare. However, designing efficie… (see more)nt personalized models remains a challenge. Traditional approaches often involve training multiple sub-models for different population sub-groups, which can be costly and does not always guarantee improved performance across all sub-groups. This paper presents a novel approach to improving model performance at the sub-group level by leveraging bias and training a joint model. Our method involves a two-step process: first, we train a model to predict group attributes, and then we use this model to learn data-dependent biases to modulate a second model for diagnosis prediction. Our results demonstrate that this joint architecture achieves consistent performance gains across all sub-groups in the Heart dataset. Furthermore, in the mortality dataset, it improves performance in two of the four sub-groups. A comparison of our method with the traditional decoupled personalization method demonstrated a greater performance gain in the sub-groups with less harm. This approach offers a more effective and scalable solution for personalization of models, which could have positive impact in healthcare and other areas that require predictive models which take sub-group information into account.

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

Towards personalized healthcare without harm via bias modulation

Frank Ngaha

Patrik Joslin Kenfack

Ulrich Aivodji

Samira Ebrahimi Kahou

Clinical prediction models are often personalized to target heterogeneous sub-groups by using demographic attributes such as race and gender… (see more) to train the model. Traditional personalization approaches involve using demographic attributes in input features or training multiple sub-models for different population subgroups (decoupling model). However, these methods often harm the performance at the subgroup level compared to non-personalized models. This paper presents a novel personalization method to improve model performance at the sub-group level. Our method involves a two-step process: first, we train a model to predict group attributes, and then we use this model to learn data-dependent biases to modulate a second model for diagnosis prediction. Our results demonstrate that this joint architecture achieves consistent performance gains across all sub-groups in the Heart dataset. Furthermore, in the mortality dataset, it improves performance in two of the four sub-groups. A comparison of our method with the traditional decoupled personalization method demonstrated a greater performance gain in the sub-groups with less harm. This approach offers a more effective and scalable solution for personalized models, which could have a positive impact in healthcare and other areas that require predictive models that take sub-group information into account.

2025-03-06

ICLR.cc/2025/Workshop/SCSL (published)

Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models

Stephen Zhewen Lu

Jiarui Lu

Hongyu Guo

Jian Tang

Proteins perform diverse biological functions, governed by the intricate relationship between their sequence and three-dimensional structure… (see more). While protein language models (PLMs) have demonstrated remarkable success in functional annotation and structure prediction, their potential for sequence-structure co-design remains underexplored. This limitation arises from pre-training objectives that favor masked token prediction over generative modeling. In this work, we systematically explore sampling strategies to enhance the generative capabilities of PLMs for co-design. Notably, we introduce a ranked iterative decoding with re-masking scheme, enabling PLMs to generate sequences and structures more effectively. Benchmarking ESM3 across multiple scales, we demonstrate that using PLMs effectively at sampling time for co-design tasks can outperform specialized architectures that lack comparable scaling properties. Our work advances the field of computational protein design by equipping PLMs with robust generative capabilities tailored to sequence-structure interdependence.

2025-03-06

ICLR.cc/2025/Workshop/LMRL (published)

Towards Protein Sequence & Structure Co-Design with Multi-Modal Language Models

Stephen Zhewen Lu

Jiarui Lu

Hongyu Guo

Jian Tang

Proteins perform diverse biological functions, governed by the intricate relationship between their sequence and three-dimensional structure… (see more). While protein language models (PLMs) have demonstrated remarkable success in functional annotation and structure prediction, their potential for sequence-structure co-design remains underexplored. This limitation arises from pre-training objectives that favor masked token prediction over generative modeling. In this work, we systematically explore sampling strategies to enhance the generative capabilities of PLMs for co-design. Notably, we introduce a ranked iterative decoding with re-masking scheme, enabling PLMs to generate sequences and structures more effectively. Benchmarking ESM3 across multiple scales, we demonstrate that using PLMs effectively at sampling time for co-design tasks can outperform specialized architectures that lack comparable scaling properties. Our work advances the field of computational protein design by equipping PLMs with robust generative capabilities tailored to sequence-structure interdependence.

2025-03-06

ICLR.cc/2025/Workshop/LMRL (published)

Who is your ideal peer mentor? A qualitative study to identify cancer patient preferences for a digital peer support app

Loes Knaapen

Andrea M. Laizner

Kelly Agnew

Xiao Jian Du

Douaa El Abiad

Luc Galarneau

Susie Judd

James Manalad

Ridhi Mittal

Tristan Williams

Brandon Woolfson

Angele Wen

John Kildea

2025-03-06

Supportive Care in Cancer (published)