Prakhar Ganesh

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework

Cléa Chataigner

Rebecca Ma

Prakhar Ganesh

Afaf Taïk

Elliot Creager

Golnoosh Farnadi

2025-09-22

NeurIPS.cc/2025/Workshop/WiML (published)

openreview.net

Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity

Prakhar Ganesh

Reza Shokri

Golnoosh Farnadi

Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Hallucinations pose various harms, from e… (see more)rosion of trust to widespread misinformation. Existing hallucination evaluation, however, focuses only on "correctness" and often overlooks "consistency", necessary to distinguish and address these harms. To bridge this gap, we introduce _prompt multiplicity_, a framework for quantifying consistency through prompt sensitivity. Our analysis reveals significant multiplicity (over 50% inconsistency in benchmarks like Med-HALT), suggesting that hallucination-related harms have been severely underestimated. Furthermore, we study the role of consistency in hallucination detection and mitigation. We find that: (a) detection techniques capture consistency, not correctness, and (b) mitigation techniques like RAG can introduce additional inconsistencies. By integrating prompt multiplicity into hallucination evaluation, we provide an improved framework of potential harms and uncover critical limitations in current detection and mitigation strategies.

2025-03-05

ICLR.cc/2025/Workshop/BuildingTrust (accepted)

openreview.net

On the Role of Prompt Multiplicity in LLM Hallucination Evaluation

Prakhar Ganesh

Reza Shokri

Golnoosh Farnadi

Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Existing hallucination benchmarks often o… (see more)verlook prompt sensitivity, due to stable accuracy scores despite prompt variations. However, such stability can be misleading. In this work, we introduce prompt multiplicity--the multiplicity of individual hallucinations depending on the input prompt--and study its role in LLM hallucination benchmarks. We find severe multiplicity, with even more than 50% of responses changing between correct and incorrect answers simply based on the prompt for certain benchmarks, like Med-HALT. Prompt multiplicity also gives us the lens to distinguish between randomness in generation and consistent factual inaccuracies, providing a more nuanced understanding of LLM hallucinations and their real-world harms. By situating our discussion within existing hallucination taxonomies--supporting their quantification--and exploring its relationship with uncertainty in generation, we highlight how prompt multiplicity fills a critical gap in the literature on LLM hallucinations.

2025-03-05

ICLR.cc/2025/Workshop/BuildingTrust (accepted)

openreview.net

Towards More Realistic Extraction Attacks: An Adversarial Perspective

Yash More

Prakhar Ganesh

Golnoosh Farnadi

2024-07-02

ArXiv (preprint)

doi.org

arxiv.org

The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity

Prakhar Ganesh

Ihsan Ibrahim Daldaban

Ignacio Cofone

Golnoosh Farnadi

Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introdu… (see more)ces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.

2024-05-28

ArXiv (preprint)

doi.org

arxiv.org

Speed Science

Leading in a New Era

Supervision Requests

Prakhar Ganesh

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Prakhar Ganesh

Publications