Portrait of Golnoosh Farnadi

Golnoosh Farnadi

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Visiting Faculty Researcher, Google
Research Topics
Deep Learning
Generative Models

Biography

Golnoosh Farnadi is an assistant professor at the School of Computer Science, McGill University, and an adjunct professor at Université de Montréal. She is a core academic member of Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair.

Farnadi founded and is a principal investigator of the EQUAL lab at Mila / McGill University. The EQUAL lab (EQuity & EQuality Using AI and Learning algorithms) is a cutting-edge research laboratory dedicated to advancing the fields of algorithmic fairness and responsible AI.

Current Students

PhD - HEC Montréal
Postdoctorate - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
Research Intern - McGill University
Collaborating researcher - UWindsor
PhD - McGill University
Co-supervisor :
Collaborating researcher - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - McGill University
Research Intern - McGill University
Independent visiting researcher - McGill University university
Research Intern - McGill University
Collaborating researcher - McGill University
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
PhD - Université de Montréal
Co-supervisor :
Master's Research - McGill University

Publications

Low-Rank Adaptation Secretly Imitates Differentially Private SGD
As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To … (see more)address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.
Algorithmic Fairness Through the Lens of Metrics and Evaluation (AFME) 2024
Miriam Rateike
Awa Dieng
Jamelle Watson-Daniels
Ferdinando Fioretto
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
Prakhar Ganeesh
Usman Gohar
Lu Cheng
With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often … (see more)compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.
Multilingual Hallucination Gaps
Cléa Chataigner
Privacy-Preserving Group Fairness in Cross-Device Federated Learning
Anderson Nascimento
Martine De Cock
Group fairness ensures that the outcome of machine learning (ML) based decision making systems are notbiased towards a certain group of peop… (see more)le defined by a sensitive attribute such as gender or ethnicity. Achievinggroup fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires usingthe sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not givingaccess to the clients’ data. As we show in this paper, this conflict between fairness and privacy in FL can beresolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). Tothis end, we propose a privacy-preserving approach to calculate group fairness notions in the cross-device FLsetting. Then, we propose two bias mitigation pre-processing and post-processing techniques in cross-deviceFL under formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values.Empirical evaluations on real world datasets demonstrate the effectiveness of our solution to train fair andaccurate ML models in federated cross-device setups with privacy guarantees to the users.
Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity
Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Hallucinations pose various harms, from e… (see more)rosion of trust to widespread misinformation. Existing hallucination evaluation, however, focuses only on "correctness" and often overlooks "consistency", necessary to distinguish and address these harms. To bridge this gap, we introduce _prompt multiplicity_, a framework for quantifying consistency through prompt sensitivity. Our analysis reveals significant multiplicity (over 50% inconsistency in benchmarks like Med-HALT), suggesting that hallucination-related harms have been severely underestimated. Furthermore, we study the role of consistency in hallucination detection and mitigation. We find that: (a) detection techniques capture consistency, not correctness, and (b) mitigation techniques like RAG can introduce additional inconsistencies. By integrating prompt multiplicity into hallucination evaluation, we provide an improved framework of potential harms and uncover critical limitations in current detection and mitigation strategies.
On the Role of Prompt Multiplicity in LLM Hallucination Evaluation
Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Existing hallucination benchmarks often o… (see more)verlook prompt sensitivity, due to stable accuracy scores despite prompt variations. However, such stability can be misleading. In this work, we introduce prompt multiplicity--the multiplicity of individual hallucinations depending on the input prompt--and study its role in LLM hallucination benchmarks. We find severe multiplicity, with even more than 50% of responses changing between correct and incorrect answers simply based on the prompt for certain benchmarks, like Med-HALT. Prompt multiplicity also gives us the lens to distinguish between randomness in generation and consistent factual inaccuracies, providing a more nuanced understanding of LLM hallucinations and their real-world harms. By situating our discussion within existing hallucination taxonomies--supporting their quantification--and exploring its relationship with uncertainty in generation, we highlight how prompt multiplicity fills a critical gap in the literature on LLM hallucinations.
UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS
As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language re… (see more)sources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.
Bridging Causality, Individual Fairness, and Adversarial Robustness in the Absence of Structural Causal Model
Ahmad Reza Ehyaei
Samira Samadi
Despite the essential need for comprehensive considerations in responsible AI, factors such as robustness, fairness, and causality are often… (see more) studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data and were unable to reflect counterfactual proximity. To address this, our paper introduces a \emph{causal fair metric} formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the applications of the causal fair metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Cristina Nader Vasconcelos
Deepak Ramachandran
Candice Schumann
Junfeng He
Katherine A Heller
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of sample… (see more)s. For any generative model, including pre-trained foundation models with GAN, Transformer or Diffusion architectures, generation performance can vary significantly based on which part of the learned data manifold is sampled. In this paper we study the post-training local geometry of the learned manifold and its relationship to generation outcomes for models ranging from toy settings to the latent decoder of the near state-of-the-art Stable Diffusion 1.4 Text-to-Image model. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling (
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Candice Schumann
Cristina Nader Vasconcelos
Deepak Ramachandran
Junfeng He
Katherine Heller
Embedding Cultural Diversity in Prototype-based Recommender Systems
Popularity bias in recommender systems can increase cultural overrepresentation by favoring norms from dominant cultures and marginalizing u… (see more)nderrepresented groups. This issue is critical for platforms offering cultural products, as they influence consumption patterns and human perceptions. In this work, we address popularity bias by identifying demographic biases within prototype-based matrix factorization methods. Using the country of origin as a proxy for cultural identity, we link this demographic attribute to popularity bias by refining the embedding space learning process. First, we propose filtering out irrelevant prototypes to improve representativity. Second, we introduce a regularization technique to enforce a uniform distribution of prototypes within the embedding space. Across four datasets, our results demonstrate a 27\% reduction in the average rank of long-tail items and a 2\% reduction in the average rank of items from underrepresented countries. Additionally, our model achieves a 2\% improvement in HitRatio@10 compared to the state-of-the-art, highlighting that fairness is enhanced without compromising recommendation quality. Moreover, the distribution of prototypes leads to more inclusive explanations by better aligning items with diverse prototypes.