Soumya Sharma

Doctorat - McGill

Superviseur⋅e principal⋅e

Golnoosh Farnadi

Co-supervisor

Adriana Romero Soriano

Sujets de recherche

Apprentissage profond

Créativité

Raisonnement

Traitement du langage naturel

Site web

Google Scholar

GitHub

Publications

IDP-Bench: Benchmarking ability of LLMs to protect personal information in interdependent privacy contexts

Ayana Hussain

Soumya Sharma

Golnoosh Farnadi

Nicholas Vincent

Héber Hwang Arcolezi

Ulrich Aivodji

Large language models (LLMs) are becoming widely deployed as personal AI assistants with access to sensitive user data, making privacy a maj… (voir plus)or challenge for their design and evaluation. Prior work focuses mainly on individual-level risks, overlooking \textbf{interdependent privacy (IDP)}--where one person's data may be revealed by others without their knowledge or consent. We address this gap by introducing \textbf{IDP-Bench}: the first LLM benchmark for IDP scenarios, grounded in the Contextual Integrity (CI) framework. We evaluate eight open-source LLMs on their understanding of IDP scenarios across three levels of IDP reasoning using two LLM judges. Results show strong co-ownership recognition (6/8 models exceed 90\%) but persistent weaknesses in identifying CI parameters (information attribute, primary subject) and IDP-specific parameters such as secondary subjects, where 7/8 models score below 74\%. Models also struggle to judge sharing appropriateness (5/8 scoring below 77\%). While the ability to judge the appropriateness of sharing improves with scale, performance tends to decline in smaller models, and prompt sensitivity remains high on IDP-specific questions--highlighting the need for more targeted study of IDP in LLM privacy research. Data \& code available \href{https://github.com/tisl-lab/Interdependent_Privacy_Bench}{here}.

2026-06-05

arXiv (prépublication)

doi.org

arxiv.org

Robust Reward Modeling via Causal Rubrics

Pragya Srivastava

Harman Singh

Rahul Madhavan

Gandharv Patil

Sravanti Addepalli

Arun Suggala

Rengarajan Aravamudhan

Soumya Sharma

Anirban Laha

Aravindan Raghuveer

Karthikeyan Shanmugam

Doina Precup

Reward models (RMs) for LLM alignment often exhibit reward hacking, mistaking spurious correlates (e.g., length, format) for causal quality … (voir plus)drivers (e.g., factuality, relevance), leading to brittle RMs. We introduce CROME (Causally Robust Reward Modeling), a causally-grounded framework using targeted augmentations to mitigate this. CROME employs: (1) Causal Augmentations, pairs isolating specific causal attribute changes, to enforce sensitivity, and (2) Neutral Augmentations, tie-labeled pairs varying spurious attributes while preserving causal content, to enforce invariance. Crucially, augmentations target LLM-identified causal rubrics, requiring no prior knowledge of spurious factors. CROME significantly outperforms baselines on RewardBench (Avg +5.4\%, Safety +13.2\%, Reasoning +7.2\%) and demonstrates enhanced robustness via improved Best-of-N performance across RewardBench, WildGuardTest, and GSM8k.

2025-06-09

ICML.cc/2025/Workshop/MoFA (poster)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Soumya Sharma

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Soumya Sharma

Publications