Portrait de Soumya Sharma n'est pas disponible

Soumya Sharma

Doctorat - McGill
Superviseur⋅e principal⋅e
Co-supervisor
Sujets de recherche
Apprentissage profond
Créativité
Raisonnement
Traitement du langage naturel

Publications

Robust Reward Modeling via Causal Rubrics
Pragya Srivastava
Harman Singh
Rahul Madhavan
Sravanti Addepalli
Arun Suggala
Rengarajan Aravamudhan
Anirban Laha
Aravindan Raghuveer
Karthikeyan Shanmugam
Reward models (RMs) for LLM alignment often exhibit reward hacking, mistaking spurious correlates (e.g., length, format) for causal quality … (voir plus)drivers (e.g., factuality, relevance), leading to brittle RMs. We introduce CROME (Causally Robust Reward Modeling), a causally-grounded framework using targeted augmentations to mitigate this. CROME employs: (1) Causal Augmentations, pairs isolating specific causal attribute changes, to enforce sensitivity, and (2) Neutral Augmentations, tie-labeled pairs varying spurious attributes while preserving causal content, to enforce invariance. Crucially, augmentations target LLM-identified causal rubrics, requiring no prior knowledge of spurious factors. CROME significantly outperforms baselines on RewardBench (Avg +5.4\%, Safety +13.2\%, Reasoning +7.2\%) and demonstrates enhanced robustness via improved Best-of-N performance across RewardBench, WildGuardTest, and GSM8k.