Abdelrahman Zayed

Should We Attend More or Less? Modulating Attention for Fairness

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

Sarath Chandar

2024-07-10

colmweb.org/COLM/2024/Conference (accepted)

doi.org

openreview.net

Why Don't Prompt-Based Fairness Metrics Correlate?

Abdelrahman Zayed

Goncalo Mordido

Ioana Baldini

Sarath Chandar

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (see more) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

2024-06-09

ArXiv (preprint)

doi.org

arxiv.org

Why Don't Prompt-Based Fairness Metrics Correlate?

Abdelrahman Zayed

Goncalo Mordido

Ioana Baldini

Sarath Chandar

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (see more) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

2024-06-09

ArXiv (preprint)

doi.org

arxiv.org

Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

Ioana Baldini

Sarath Chandar

The increasing size of large language models (LLMs) has introduced challenges in their training and inference. Removing model components is … (see more)perceived as a solution to tackle the large model sizes, however, existing pruning methods solely focus on performance, without considering an essential aspect for the responsible use of LLMs: model fairness. It is crucial to address the fairness of LLMs towards diverse groups, such as women, Black people, LGBTQ+, Jewish communities, among others, as they are being deployed and available to a wide audience. In this work, first, we investigate how attention heads impact fairness and performance in pre-trained transformer-based language models. We then propose a novel method to prune the attention heads that negatively impact fairness while retaining the heads critical for performance, i.e. language modeling capabilities. Our approach is practical in terms of time and resources, as it does not require fine-tuning the final pruned, and fairer, model. Our findings demonstrate a reduction in gender bias by 19%, 19.5%, 39.5%, 34.7%, 23%, and 8% for DistilGPT-2, GPT-2, GPT-Neo of two different sizes, GPT-J, and Llama 2 models, respectively, in comparison to the biased model, with only a slight decrease in performance. WARNING: This work uses language that is offensive in nature.

2023-12-24

ArXiv (preprint)

doi.org

arxiv.org

Fairness-Aware Structured Pruning in Transformers

Abdelrahman Zayed

Goncalo Mordido

Samira Shabanian

Ioana Baldini

Sarath Chandar

2023-12-24

ArXiv (preprint)