Portrait of Golnoosh Farnadi

Golnoosh Farnadi

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Visiting Faculty Researcher, Google
Research Topics
Deep Learning
Generative Models

Biography

Golnoosh Farnadi is an assistant professor at the School of Computer Science, McGill University, and an adjunct professor at Université de Montréal. She is a core academic member of Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair.

Farnadi founded and is a principal investigator of the EQUAL lab at Mila / McGill University. The EQUAL lab (EQuity & EQuality Using AI and Learning algorithms) is a cutting-edge research laboratory dedicated to advancing the fields of algorithmic fairness and responsible AI.

Current Students

PhD - HEC Montréal
Postdoctorate - McGill University
Research Intern - McGill University
Master's Research - McGill University
Co-supervisor :
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Master's Research - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Polytechnique Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
Independent visiting researcher - HEC Montréal

Publications

Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
Prakhar Ganeesh
Usman Gohar
Lu Cheng
With fairness concerns gaining significant attention in Machine Learning (ML), several bias mitigation techniques have been proposed, often … (see more)compared against each other to find the best method. These benchmarking efforts tend to use a common setup for evaluation under the assumption that providing a uniform environment ensures a fair comparison. However, bias mitigation techniques are sensitive to hyperparameter choices, random seeds, feature selection, etc., meaning that comparison on just one setting can unfairly favour certain algorithms. In this work, we show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores. We highlight that most bias mitigation techniques can achieve comparable performance, given the freedom to perform hyperparameter optimization, suggesting that the choice of the evaluation parameters-rather than the mitigation technique itself-can sometimes create the perceived superiority of one method over another. We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness, and trends that guide the selection of appropriate algorithms.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Afaf Taïk
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (see more)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Afaf Taïk
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (see more)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
Multilingual Hallucination Gaps in Large Language Models
Cl'ea Chataigner
Afaf Taïk
Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (see more)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.
FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation
Rohan Sukumaran
Aarash Feizi
Adriana Romero-Sorian
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training
Shahrad Mohammadzadeh
Juan David Guerra
As large language models (LLMs) are increasingly deployed across various industries, concerns regarding their reliability, particularly due … (see more)to hallucinations - outputs that are factually inaccurate or irrelevant to user input - have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M - 12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce Sensitivity Dropout (SenD), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SenD achieves this by deterministically dropping embedding indices with significant variability, referred to as Sensitive Embedding Indices. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore at 2x speed. This efficient metric is integrated into our protocol, allowing SenD to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to Wikipedia, Medical, and LegalBench domains.
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
Shahrad Mohammadzadeh
Juan David Guerra
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training
Shahrad Mohammadzadeh
Juan David Guerra
As large language models (LLMs) become increasingly deployed across various industries, concerns regarding their reliability, particularly d… (see more)ue to hallucinations-outputs that are factually inaccurate or irrelevant to user input-have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M-12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce SEnsitive Neuron Dropout (SeND), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SeND achieves this by deterministically dropping neurons with significant variability on a dataset, referred to as Sensitive Neurons. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our protocol, allowing SeND to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to domains such as Wikipedia and Medical datasets.
On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy
Saber Malekmohammadi
A significant approach in natural language processing involves large-scale pre-training on general domain data followed by adaptation to spe… (see more)cific tasks or domains. As models grow in size, full fine-tuning all parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g. LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some random noise into the batch gradients w.r.t the adapter parameters coming from their full fine-tuning, and we quantify the variance of the injected noise. By establishing a Berry-Esseen type bound on the total variation distance between the noise distribution and a Gaussian distribution with the same variance, we show that the dynamics of LoRA and FLoRA are very close to differentially private full fine-tuning the adapters, which suggests that low-rank adaptation implicitly provides privacy w.r.t the fine-tuning data. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient clipping, low-rank adaptation is almost equivalent to differentially private full fine-tuning adapters with a fixed noise scale.
Wasserstein Distributionally Robust Optimization through the Lens of Structural Causal Models and Individual Fairness
Ahmad Reza Ehyaei
Samira Samadi
Understanding the Local Geometry of Generative Model Manifolds
Ahmed Imtiaz Humayun
Ibtihel Amara
Candice Schumann
Mohammad Havaei
Deep generative models learn continuous representations of complex data manifolds using a finite number of samples during training. For a pr… (see more)e-trained generative model, the common way to evaluate the quality of the manifold representation learned, is by computing global metrics like Fr\'echet Inception Distance using a large number of generated and real samples. However, generative model performance is not uniform across the learned manifold, e.g., for \textit{foundation models} like Stable Diffusion generation performance can vary significantly based on the conditioning or initial noise vector being denoised. In this paper we study the relationship between the \textit{local geometry of the learned manifold} and downstream generation. Based on the theory of continuous piecewise-linear (CPWL) generators, we use three geometric descriptors - scaling (
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Niloofar Mireshghallah
Maria Antoniak
Yash More
Yejin Choi
Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate pr… (see more)ivacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.