Eva Portelance

"On the goals of linguistic theory": Revisiting Chomskyan theories in the era of AI

Masoud Jasbi

Theoretical linguistics seeks to explain what human language is, and why. Linguists and cognitive scientists have proposed different theoret… (see more)ical models of what language is, as well as cognitive factors that shape it, and allow humans to 'produce', 'understand', and 'acquire' natural languages. However, humans may no longer be the only ones learning to 'generate', 'parse', and 'learn' natural language: artificial intelligence (AI) models such as large language models are proving to have impressive linguistic capabilities. Many are thus questioning what role, if any, such models should play in helping theoretical linguistics reach its ultimate research goals? In this paper, we propose to answer this question, by reiterating the tenets of generative linguistics, a leading school of thought in the field, and by considering how AI models as theories of language relate to each of these important concepts. Specifically, we consider three foundational principles, finding roots in the early works of Noam Chomsky: (1) levels of theoretical adequacy; (2) procedures for linguistic theory development; (3) language learnability and Universal Grammar. In our discussions of each principle, we give special attention to two types of AI models: neural language models and neural grammar induction models. We will argue that such models, in particular neural grammar induction models, do have a role to play, but that this role is largely modulated by the stance one takes regarding each of these three guiding principles.

2024-11-15

ArXiv (preprint)

"On the goals of linguistic theory": Revisiting Chomskyan theories in the era of AI

Masoud Jasbi

Theoretical linguistics seeks to explain what human language is, and why. Linguists and cognitive scientists have proposed different theoret… (see more)ical models of what language is, as well as cognitive factors that shape it, and allow humans to 'produce', 'understand', and 'acquire' natural languages. However, humans may no longer be the only ones learning to 'generate', 'parse', and 'learn' natural language: artificial intelligence (AI) models such as large language models are proving to have impressive linguistic capabilities. Many are thus questioning what role, if any, such models should play in helping theoretical linguistics reach its ultimate research goals? In this paper, we propose to answer this question, by reiterating the tenets of generative linguistics, a leading school of thought in the field, and by considering how AI models as theories of language relate to each of these important concepts. Specifically, we consider three foundational principles, finding roots in the early works of Noam Chomsky: (1) levels of theoretical adequacy; (2) procedures for linguistic theory development; (3) language learnability and Universal Grammar. In our discussions of each principle, we give special attention to two types of AI models: neural language models and neural grammar induction models. We will argue that such models, in particular neural grammar induction models, do have a role to play, but that this role is largely modulated by the stance one takes regarding each of these three guiding principles.

2024-11-15

ArXiv (preprint)

The Roles of Neural Networks in Language Acquisition

Masoud Jasbi

How can modern neural networks like language models be useful to the field of language acquisition, and more broadly cognitive science, if t… (see more)hey are not a priori designed to be cognitive models? As developments towards natural language understanding and generation have improved leaps and bounds, with models like GPT‐4, the question of how they can inform our understanding of human language acquisition has re‐emerged. As such, it is critical to examine how in practice linking hypotheses between models and human learners can be safely established. To address these questions, we propose a model taxonomy, including four modelling approaches, each having differing goals, from exploratory hypothesis generation to hypothesis differentiation and testing. We show how the goals of these approaches align with the overarching goals of science and linguistics by connecting our taxonomy to the realist versus instrumentalist approaches in philosophy of science. We survey recent work having adopted each of our modelling approaches and address the importance of computational modelling in language acquisition studies.

2024-10-24

Language and Linguistics Compass (published)

The Roles of Neural Networks in Language Acquisition

Masoud Jasbi

How can modern neural networks like language models be useful to the field of language acquisition, and more broadly cognitive science, if t… (see more)hey are not a priori designed to be cognitive models? As developments towards natural language understanding and generation have improved leaps and bounds, with models like GPT‐4, the question of how they can inform our understanding of human language acquisition has re‐emerged. As such, it is critical to examine how in practice linking hypotheses between models and human learners can be safely established. To address these questions, we propose a model taxonomy, including four modelling approaches, each having differing goals, from exploratory hypothesis generation to hypothesis differentiation and testing. We show how the goals of these approaches align with the overarching goals of science and linguistics by connecting our taxonomy to the realist versus instrumentalist approaches in philosophy of science. We survey recent work having adopted each of our modelling approaches and address the importance of computational modelling in language acquisition studies.

2024-10-24

Language and Linguistics Compass (published)

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (see more)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)

openreview.net

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (see more)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)

openreview.net

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.

2024-10-02

ArXiv (preprint)

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.

2024-10-02

ArXiv (preprint)

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.

2024-10-02

ArXiv (preprint)

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.

2024-10-02

ArXiv (preprint)