Portrait of Siva Reddy

Siva Reddy

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science and Department of Linguistics
Research Topics
Deep Learning
Natural Language Processing
Reasoning
Representation Learning

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Collaborating researcher
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Research Intern - UNIVERSITÄT DES SAARLANDES
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
Principal supervisor :
Research Intern - McGill University
Postdoctorate - McGill University
Research Intern - McGill University
Collaborating researcher - Cambridge University
Research Intern - McGill University

Publications

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (see more)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Amirhossein Kazemnejad
Milad Aghajohari
Eva Portelance
Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
Are self-explanations from Large Language Models faithful?
Andreas Madsen
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader
Vaibhav Adlakha
Marius Mosbach
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing… (see more) actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) developing critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing.
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
Eva Portelance
Timothy John O'donnell
Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to hel… (see more)p later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel
Pradeep Dasigi
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Vaibhav Adlakha
Parishad BehnamGhader
Xing Han Lu
Nicholas Meade
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as … (see more)question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa
Interpretability Needs a New Paradigm
Andreas Madsen
Himabindu Lakkaraju
Faithfulness Measurable Masked Language Models
Andreas Madsen
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lu
Zdeněk Kasner