Portrait of Golnoosh Farnadi

Golnoosh Farnadi

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Visiting Faculty Researcher, Google
Research Topics
Deep Learning
Generative Models

Biography

Golnoosh Farnadi is an assistant professor at the School of Computer Science, McGill University, and an adjunct professor at Université de Montréal. She is a core academic member of Mila – Quebec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair.

Farnadi founded and is a principal investigator of the EQUAL lab at Mila / McGill University. The EQUAL lab (EQuity & EQuality Using AI and Learning algorithms) is a cutting-edge research laboratory dedicated to advancing the fields of algorithmic fairness and responsible AI.

Current Students

PhD - HEC Montréal
Postdoctorate - McGill University
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Collaborating researcher
Master's Research - Université de Montréal
Principal supervisor :
Collaborating researcher - UWindsor
PhD - McGill University
Co-supervisor :
Collaborating researcher - McGill University
Collaborating Alumni - Université de Montréal
Collaborating researcher - McGill University
Research Intern - McGill University
Independent visiting researcher - McGill University university
Research Intern - McGill University
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - Université de Sherbrooke
Independent visiting researcher - HEC Montréal
Master's Research - McGill University

Publications

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Niloofar Mireshghallah
Maria Antoniak
Yash More
Yejin Choi
Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate pr… (see more)ivacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Niloofar Mireshghallah
Maria Antoniak
Yash More
Yejin Choi
Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate pr… (see more)ivacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Towards More Realistic Extraction Attacks: An Adversarial Perspective
On The Local Geometry of Deep Generative Manifolds
Ahmed Imtiaz Humayun
Candice Schumann
Mohammad Havaei
In this paper, we study theoretically inspired local geometric descriptors of the data manifolds approximated by pre-trained generative mode… (see more)ls. The descriptors – local scaling (ψ), local rank (ν), and local complexity (δ) — characterize the uncertainty, dimensionality, and smoothness on the learned manifold, using only the network weights and architecture. We investigate and emphasize their critical role in understanding generative models. Our analysis reveals that the local geometry is intricately linked to the quality and diversity of generated outputs. Additionally, we see that the geometric properties are distinct for out-of-distribution (OOD) inputs as well as for prompts memorized by Stable Diffusion, showing the possible application of our proposed descriptors for downstream detection and assessment of pre-trained generative models.
Differentially Private Clustered Federated Learning
Differentially Private Clustered Federated Learning
Federated learning (FL), which is a decentralized machine learning (ML) approach, often incorporates differential privacy (DP) to provide ri… (see more)gorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL), but these methods remain sensitive and prone to errors, further exacerbated by the DP noise. This vulnerability makes the previous methods inappropriate for differentially private FL (DPFL) settings with structured data heterogeneity. To address this gap, we propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly. To this end, we propose to cluster clients based on both their model updates and training loss values. Furthermore, for clustering clients' model updates at the end of the first round, our proposed approach addresses the server's uncertainties by employing large batch sizes as well as Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise. We provide theoretical analysis to justify our approach and evaluate it across diverse data distributions and privacy budgets. Our experimental results show its effectiveness in addressing large structured data heterogeneity in DPFL.
Mitigating Disparate Impact of Differential Privacy in Federated Learning through Robust Clustering
Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Priv… (see more)acy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost.
The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introdu… (see more)ces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.
Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
Armin Moradi
Nicola Neophytou
Fairness Incentives in Response to Unfair Dynamic Pricing
Hadi Nekoei
Janarthanan Rajendran
The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' … (see more)demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, policy makers might leverage tools such as taxation and subsidy to adapt policy mechanisms dependent upon their social objective. In this paper, we explore the potential for AI methods to assist such intervention strategies. To this end, we design a basic simulated economy, wherein we introduce a dynamic social planner (SP) to generate corporate taxation schedules geared to incentivizing firms towards adopting fair pricing behaviours, and to use the collected tax budget to subsidize consumption among underrepresented groups. To cover a range of possible policy scenarios, we formulate our social planner's learning problem as a multi-armed bandit, a contextual bandit and finally as a full reinforcement learning (RL) problem, evaluating welfare outcomes from each case. To alleviate the difficulty in retaining meaningful tax rates that apply to less frequently occurring brackets, we introduce FairReplayBuffer, which ensures that our RL agent samples experiences uniformly across a discretized fairness space. We find that, upon deploying a learned tax and redistribution policy, social welfare improves on that of the fairness-agnostic baseline, and approaches that of the analytically optimal fairness-aware baseline for the multi-armed and contextual bandit settings, and surpassing it by 13.19% in the full RL setting.
Learning to Build Solutions in Stochastic Matching Problems Using Flows (Student Abstract)