Portrait de Prasanna Parthasarathi n'est pas disponible

Prasanna Parthasarathi

Alumni

Publications

GRPO-$\lambda$: Credit Assignment improves LLM Reasoning
Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving th… (voir plus)eir reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-
GRPO-$\lambda$: Credit Assignment improves LLM Reasoning
Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving th… (voir plus)eir reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-
GRPO-$\lambda$: Credit Assignment improves LLM Reasoning
Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving th… (voir plus)eir reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Mehdi Rezagholizadeh
Boxing Chen
The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some … (voir plus)of this is also owed to the risks and costs associated with their use. On one front is their tendency to hallucinate false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Mehdi Rezagholizadeh
Boxing Chen
The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some … (voir plus)of this is also owed to the risks and costs associated with their use. On one front is their tendency to \textit{hallucinate} false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.
Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games
Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involve… (voir plus)s training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.
Towards Practical Tool Usage for Continually Learning LLMs
Large language models (LLMs) show an innate skill for solving language based tasks. But insights have suggested an inability to adjust for i… (voir plus)nformation or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time. Tool use helps by offloading work to systems that the LLM can access through an interface, but LLMs that use them still must adapt to nonstationary environments for prolonged use, as new tools can emerge and existing tools can change. Nevertheless, tools require less specialized knowledge, therefore we hypothesize they are better suited for continual learning (CL) as they rely less on parametric memory for solving tasks and instead focus on learning when to apply pre-defined tools. To verify this, we develop a synthetic benchmark and follow this by aggregating existing NLP tasks to form a more realistic testing scenario. While we demonstrate scaling model size is not a solution, regardless of tool usage, continual learning techniques can enable tool LLMs to both adapt faster while forgetting less, highlighting their potential as continual learners.
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
Do Large Language Models Know How Much They Know?
Do Large Language Models Know How Much They Know?
Large Language Models (LLMs) have emerged as highly capable systems and are increasingly being integrated into various uses. Nevertheless, t… (voir plus)he rapid advancement in their deployment trails a comprehensive understanding of their internal mechanisms, as well as a delineation of their capabilities and limitations. A desired characteristic of an intelligent system is its ability to recognize the scope of its own knowledge. To investigate whether LLMs embody this attribute, we develop a benchmark that challenges these models to enumerate all information they possess on specific topics. This benchmark assesses whether the models recall excessive, insufficient, or the precise amount of required information, thereby indicating their awareness of how much they know about the given topic. Our findings reveal that the emergence of this property varies across different architectures and manifests at diverse rates. However, with sufficient scaling, all tested models are ultimately capable of performing this task. The insights gained from this research advance our understanding of LLMs, shedding light on their operational capabilities and contributing to the ongoing exploration of their intricate dynamics.
Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models