Portrait de Golnoosh Farnadi

Golnoosh Farnadi

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure adjointe, McGill University, École d'informatique
Professeure associée, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheuse invitée, Google
Sujets de recherche
Apprentissage profond
Modèles génératifs

Biographie

Golnoosh Farnadi est professeure associée à l'École d'informatique de l'Université McGill et professeure associée à l'Université de Montréal. Elle est membre académique principal à Mila - Institut québécois d'intelligence artificielle et est titulaire d'une chaire CIFAR d'intelligence artificielle au Canada.

Mme Farnadi a fondé le laboratoire EQUAL à Mila / Université McGill, dont elle est l'une des principales chercheuses. Le laboratoire EQUAL (EQuity & EQuality Using AI and Learning algorithms) est un laboratoire de recherche de pointe dédié à l'avancement des domaines de l'équité algorithmique et de l'IA responsable.

Étudiants actuels

Doctorat - HEC
Postdoctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill
Collaborateur·rice de recherche - UWindsor
Doctorat - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - McGill
Collaborateur·rice alumni - UdeM
Collaborateur·rice de recherche - McGill
Stagiaire de recherche - McGill
Visiteur de recherche indépendant - McGill university
Stagiaire de recherche - McGill
Collaborateur·rice de recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - McGill

Publications

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
A Learning-Based Framework for Fair and Scalable Solution Generation in Kidney Exchange Problems
Embedding Cultural Diversity in Prototype-based Recommender Systems
Popularity bias in recommender systems can increase cultural overrepresentation by favoring norms from dominant cultures and marginalizing u… (voir plus)nderrepresented groups. This issue is critical for platforms offering cultural products, as they influence consumption patterns and human perceptions. In this work, we address popularity bias by identifying demographic biases within prototype-based matrix factorization methods. Using the country of origin as a proxy for cultural identity, we link this demographic attribute to popularity bias by refining the embedding space learning process. First, we propose filtering out irrelevant prototypes to improve representativity. Second, we introduce a regularization technique to enforce a uniform distribution of prototypes within the embedding space. Across four datasets, our results demonstrate a 27\% reduction in the average rank of long-tail items and a 2\% reduction in the average rank of items from underrepresented countries. Additionally, our model achieves a 2\% improvement in HitRatio@10 compared to the state-of-the-art, highlighting that fairness is enhanced without compromising recommendation quality. Moreover, the distribution of prototypes leads to more inclusive explanations by better aligning items with diverse prototypes.
Embedding Cultural Diversity in Prototype-based Recommender Systems
Popularity bias in recommender systems can increase cultural overrepresentation by favoring norms from dominant cultures and marginalizing u… (voir plus)nderrepresented groups. This issue is critical for platforms offering cultural products, as they influence consumption patterns and human perceptions. In this work, we address popularity bias by identifying demographic biases within prototype-based matrix factorization methods. Using the country of origin as a proxy for cultural identity, we link this demographic attribute to popularity bias by refining the embedding space learning process. First, we propose filtering out irrelevant prototypes to improve representativity. Second, we introduce a regularization technique to enforce a uniform distribution of prototypes within the embedding space. Across four datasets, our results demonstrate a 27\% reduction in the average rank of long-tail items and a 2\% reduction in the average rank of items from underrepresented countries. Additionally, our model achieves a 2\% improvement in HitRatio@10 compared to the state-of-the-art, highlighting that fairness is enhanced without compromising recommendation quality. Moreover, the distribution of prototypes leads to more inclusive explanations by better aligning items with diverse prototypes.
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
Prakhar Ganeesh
Usman Gohar
Lu Cheng
With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often … (voir plus)compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
Prakhar Ganeesh
Usman Gohar
Lu Cheng
With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often … (voir plus)compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Jonathan Colacco-Carr
Yash More
Jackie Ck Cheung
In an effort to mitigate the harms of large language models (LLMs), learning from human feedback (LHF) has been used to steer LLMs towards o… (voir plus)utputs that are intended to be both less harmful and more helpful. Despite the widespread adoption of LHF in practice, the quality of this feedback and its effectiveness as a safety mitigation technique remain unclear. This study addresses these issues by auditing the widely-used Helpful and Harmless (HH) dataset by Anthropic. Our work includes: (1) a thorough investigation of the dataset's content through both manual and automated evaluation; (2) experiments demonstrating the dataset's impact on models' safety; and (3) an analysis of the 100 most influential papers citing this dataset. Through our audit, we showcase how conceptualization failures and quality issues identified in the HH dataset can create additional harms by leading to disparate safety behaviors across demographic groups. Our findings highlight the need for more nuanced, context-sensitive approaches to safety mitigation in LLMs.
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
In an effort to mitigate the harms of large language models (LLMs), learning from human feedback (LHF) has been used to steer LLMs towards o… (voir plus)utputs that are intended to be both less harmful and more helpful. Despite the widespread adoption of LHF in practice, the quality of this feedback and its effectiveness as a safety mitigation technique remain unclear. This study addresses these issues by auditing the widely-used Helpful and Harmless (HH) dataset by Anthropic. Our work includes: (1) a thorough investigation of the dataset's content through both manual and automated evaluation; (2) experiments demonstrating the dataset's impact on models' safety; and (3) an analysis of the 100 most influential papers citing this dataset. Through our audit, we showcase how conceptualization failures and quality issues identified in the HH dataset can create additional harms by leading to disparate safety behaviors across demographic groups. Our findings highlight the need for more nuanced, context-sensitive approaches to safety mitigation in LLMs.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (voir plus)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (voir plus)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (voir plus)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation