Publications

Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML

Prakhar Ganesh

Usman Gohar

Lu Cheng

Golnoosh Farnadi

With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often … (see more)compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.

2025-04-22

Algorithmic Fairness Through the Lens of Metrics and Evaluation @ International Conference on Machine Learning (published)

Distilling semantically aware orders for autoregressive image generation

Rishav Pramanik

Antoine Poupon

Juan A. Rodriguez

Masih Aminbeidokhti

David Vázquez

Christopher Pal

Zhaozheng Yin

Marco Pedersoli

2025-04-22

ArXiv (preprint)

arxiv.org

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu

Yossiri Adulyasak

Nima Akbarzadeh

Erick Delage

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of

2025-04-22

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (published)

A flaw in using pre-trained pLLMs in protein-protein interaction inference models

Joseph Szymborski

Amin Emad

With the growing pervasiveness of pre-trained protein large language models (pLLMs), pLLM-based methods are increasingly being put forward f… (see more)or the protein-protein interaction (PPI) inference task. Here, we identify and confirm that existing pre-trained pLLMs are a source of data leakage for the downstream PPI task. We characterize the extent of the data leakage problem by training and comparing small and efficient pLLMs on a dataset that controls for data leakage (“strict”) with one that does not (“non-strict”). While data leakage from pre-trained pLLMs cause measurable inflation of testing scores, we find that this does not necessarily extend to other, non-paired biological tasks such as protein keyword annotation. Further, we find no connection between the context-lengths of pLLMs and the performance of pLLM-based PPI inference methods on proteins with sequence lengths that surpass it. Furthermore, we show that pLLM-based and non-pLLM-based models fail to generalize in tasks such as prediction of the human-SARS-CoV-2 PPIs or the effect of point mutations on binding-affinities. This study demonstrates the importance of extending existing protocols for the evaluation of pLLM-based models applied to paired biological datasets and identifies areas of weakness of current pLLM models.

2025-04-22

bioRxiv (preprint)

Langevin Soft Actor-Critic: Efficient Exploration Through Uncertainty-Driven Critic Learning

Haque Ishfaq

Guangyuan Wang

Sami Nur Islam

Doina Precup

Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample effici… (see more)ency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based

2025-04-22

International Conference on Learning Representations (Accept (Poster))

Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation

Zhixiang Chi

Li Gu

Huan Liu

Ziqiang Wang

Yanan Wu

Yang Wang

Konstantinos N Plataniotis

Few-shot Test-Time Domain Adaptation focuses on adapting a model at test time to a specific domain using only a few unlabeled examples, addr… (see more)essing domain shift. Prior methods leverage CLIP's strong out-of-distribution (OOD) abilities by generating domain-specific prompts to guide its generalized, frozen features. However, since downstream datasets are not explicitly seen by CLIP, solely depending on the feature space knowledge is constrained by CLIP's prior knowledge. Notably, when using a less robust backbone like ViT-B/16, performance significantly drops on challenging real-world benchmarks. Departing from the state-of-the-art of inheriting the intrinsic OOD capability of CLIP, this work introduces learning directly on the input space to complement the dataset-specific knowledge for frozen CLIP. Specifically, an independent side branch is attached in parallel with CLIP and enforced to learn exclusive knowledge via revert attention. To better capture the dataset-specific label semantics for downstream adaptation, we propose to enhance the inter-dispersion among text features via greedy text ensemble and refinement. The text and visual features are then progressively fused in a domain-aware manner by a generated domain prompt to adapt toward a specific domain. Extensive experiments show our method's superiority on 5 large-scale benchmarks (WILDS and DomainNet), notably improving over smaller networks like ViT-B/16 with gains of \textbf{+5.1} in F1 for iWildCam and \textbf{+3.1\%} in WC Acc for FMoW.

2025-04-22

International Conference on Learning Representations (Accept (Poster))

Multilingual Hallucination Gaps

Cléa Chataigner

Afaf Taïk

Golnoosh Farnadi

Large language models (LLMs) are increasingly used as alternatives to traditional searchengines given their capacity to generate text that r… (see more)esembles human language. However, thisshift is concerning, as LLMs often generate hallucinations—misleading or false informationthat appears highly credible. In this study, we explore the phenomenon of hallucinationsacross multiple languages in free-form text generation, focusing on what we call multilingualhallucination gaps. These gaps reflect differences in the frequency of hallucinated answersdepending on the prompt and language used. To quantify such hallucinations, we used theFActScore metric and extended its framework to a multilingual setting. We conductedexperiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographiesin 19 languages and comparing the results to Wikipedia pages. Our results reveal varia-tions in hallucination rates, especially between high- and low-resource languages, raisingimportant questions about LLM multilingual performance and the challenges in evaluatinghallucinations in multilingual free-form text generation.

2025-04-22

Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation (published)

Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen

Alessandro Palma

Till Richter

Hanyi Zhang

Manuel Lubetzki

Alexander Tong

Andrea Dittadi

Fabian J. Theis

Generative modeling of single-cell RNA-seq data is crucial for tasks like trajectory inference, batch effect removal, and simulation of real… (see more)istic cellular data. However, recent deep generative models simulating synthetic single cells from noise operate on pre-processed continuous gene expression approximations, overlooking the discrete nature of single-cell data, which limits their effectiveness and hinders the incorporation of robust noise models. Additionally, aspects like controllable multi-modal and multi-label generation of cellular data remain underexplored. This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data. CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics while tackling relevant generative tasks such as rare cell type augmentation and batch correction. We also introduce a novel framework for compositional data generation using Flow Matching. By showcasing CFGen on a diverse set of biological datasets and settings, we provide evidence of its value to the fields of computational biology and deep generative models.

2025-04-22

International Conference on Learning Representations (Accept (Poster))

Performative Prediction on Games and Mechanism Design

Fernando P. Santos

2025-04-22

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (published)

Planning and Learning in Risk-Aware Restless Multi-Arm Bandits

Nima Akbarzadeh

Yossiri Adulyasak

Erick Delage

2025-04-22

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (published)

Privacy-Preserving Group Fairness in Cross-Device Federated Learning

Sikha Pentyala

Nicola Neophytou

Anderson Nascimento

Martine De Cock

Golnoosh Farnadi

Group fairness ensures that the outcome of machine learning (ML) based decision making systems are notbiased towards a certain group of peop… (see more)le defined by a sensitive attribute such as gender or ethnicity. Achievinggroup fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires usingthe sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not givingaccess to the clients’ data. As we show in this paper, this conflict between fairness and privacy in FL can beresolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). Tothis end, we propose a privacy-preserving approach to calculate group fairness notions in the cross-device FLsetting. Then, we propose two bias mitigation pre-processing and post-processing techniques in cross-deviceFL under formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values.Empirical evaluations on real world datasets demonstrate the effectiveness of our solution to train fair andaccurate ML models in federated cross-device setups with privacy guarantees to the users.

2025-04-22

Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation (published)