Siva Reddy

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

Vaibhav Adlakha

PhD - McGill University

Parishad BehnamGhader

Master's Research - McGill University

PhD - McGill University

Collaborating researcher

Gaurav Kamath

PhD - McGill University

Aditi Khandelwal

PhD - McGill University

Principal supervisor :

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Aravind Krishnan

Collaborating Alumni - UNIVERSITÄT DES SAARLANDES

Benno Krojer

PhD - McGill University

Zichao Li

PhD - McGill University

Co-supervisor :

Jackie Cheung

Xing Han Lu

PhD - McGill University

Research Intern - McGill University

PhD - McGill University

Postdoctorate - McGill University

Oh Oh

Collaborating researcher

Arkil Patel

PhD - McGill University

Principal supervisor :

Collaborating researcher

Karolina Ewa Stańczak

Collaborating Alumni - McGill University

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Ada Tur

Research Intern - McGill University

Collaborating Alumni - McGill University

Blog Posts

October 1, 2024

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Are self-explanations from Large Language Models faithful?

Andreas Madsen

Sarath Chandar

2024-08-01

Findings of the Association for Computational Linguistics ACL 2024 (published)

Benchmarking Vision Language Models for Cultural Understanding

Shravan Nayak

Kanishk Jain

Rabiul Awal

Sjoerd van Steenkiste

Lisa Anne Hendricks

Karolina Stanczak

Aishwarya Agrawal

Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of vi… (see more)sual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing VLM’s geo-diverse cultural understanding. We curate a diverse collection of 2,378 image-question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents. The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions. Benchmarking VLMs on CulturalVQA, including GPT-4V and Gemini, reveals disparity in their level of cultural understanding across regions, with strong cultural understanding capabilities for North America while significantly weaker capabilities for Africa. We observe disparity in their performance across cultural facets too, with clothing, rituals, and traditions seeing higher performances than food and drink. These disparities help us identify areas where VLMs lack cultural understanding and demonstrate the potential of CulturalVQA as a comprehensive evaluation set for gauging VLM progress in understanding diverse cultures.

2024-07-15

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader

Vaibhav Adlakha

Marius Mosbach

Dzmitry Bahdanau

Nicolas Chapados

2024-07-10

colmweb.org/COLM/2024/Conference (accepted)

openreview.net

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Benno Krojer

Dheeraj Vattikonda

Luis Lara

Varun Jampani

Eva Portelance

Chris Pal

An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing… (see more) actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) developing critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing.

2024-07-03

ArXiv (preprint)

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models

Eva Portelance

Timothy John O'donnell

Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to hel… (see more)p later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.

2024-06-17

ArXiv (preprint)

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Dzmitry Bahdanau

Pradeep Dasigi

2024-06-01

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (published)

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

Vaibhav Adlakha

Parishad BehnamGhader

Xing Han Lu

Nicholas Meade

Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as … (see more)question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa

2024-05-16

Transactions of the Association for Computational Linguistics (published)

Interpretability Needs a New Paradigm

Andreas Madsen

Himabindu Lakkaraju

Sarath Chandar

2024-05-08

ArXiv (preprint)