Portrait of Siva Reddy

Siva Reddy

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science and Department of Linguistics
Research Topics
Deep Learning
Natural Language Processing
Reasoning
Representation Learning

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Collaborating researcher - McGill University
Postdoctorate - University of Edinburgh
Collaborating researcher
Research Intern - McGill University
Independent visiting researcher
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Collaborating researcher
PhD - McGill University
Co-supervisor :
Collaborating researcher - INSA Lyon, France
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
Collaborating researcher - N/A
Research Intern - McGill University
Collaborating Alumni
Collaborating Alumni - McGill University
Collaborating researcher
Co-supervisor :
Research Intern - McGill University
Collaborating Alumni - McGill University
Research Intern - McGill University

Publications

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning
Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (see more)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.
Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Dheeraj Vattikonda
Varun Jampani
Christopher Pal
Are self-explanations from Large Language Models faithful?
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (see more) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data (as of May 24, 2024). Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Dheeraj Vattikonda
Varun Jampani
Christopher Pal
Evaluating In-Context Learning of Libraries for Code Generation
Interpretability Needs a New Paradigm
Himabindu Lakkaraju
A. Chandar
Faithfulness Measurable Masked Language Models
A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortuna… (see more)tely, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues, and existing solutions that address this are computationally expensive and employ proxy models. Furthermore, other metrics are very limited in scope. This work proposes an inherently faithfulness measurable model that addresses these challenges. This is achieved using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to 16 different datasets and validate it using statistical in-distribution tests. The faithfulness is then measured with 9 different importance measures. Because masking is in-distribution, importance measures that themselves use masking become consistently more faithful. Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.
Weblinx: Real-World Website Navigation with Multi-Turn Dialogue
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx
Universal Adversarial Triggers Are Not Universal
A Compositional Typed Semantics for Universal Dependencies
Timothy John O'donnell