Siva Reddy

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

Vaibhav Adlakha

PhD - McGill University

Parishad BehnamGhader

Master's Research - McGill University

PhD - McGill University

Verna Dankers Dankers

Collaborating researcher - University of Edinburgh

Collaborating researcher

Gaurav Kamath

PhD - McGill University

Aditi Khandelwal

PhD - McGill University

Principal supervisor :

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Aravind Krishnan

Collaborating Alumni - UNIVERSITÄT DES SAARLANDES

Benno Krojer

PhD - McGill University

Zichao Li

PhD - McGill University

Co-supervisor :

Jackie Cheung

Xing Han Lu

PhD - McGill University

Research Intern - McGill University

PhD - McGill University

Postdoctorate - McGill University

Oh Oh

Collaborating researcher

Arkil Patel

PhD - McGill University

Principal supervisor :

Collaborating Alumni

Karolina Ewa Stańczak

Collaborating Alumni - McGill University

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Ada Tur

Research Intern - McGill University

Collaborating researcher

Collaborating Alumni - McGill University

Blog Posts

October 1, 2024

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Abstract Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope … (see more)ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models—GPT-2, GPT-3/3.5, Llama 2, and GPT-4—treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).1

2024-04-05

ArXiv (preprint)

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WebLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx.

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

A Compositional Typed Semantics for Universal Dependencies

Laurestine Bradford

Timothy John O'donnell

2024-03-02

ArXiv (preprint)

When does word order matter and when doesn't it?

Xuanda Chen

Timothy John O'donnell

Language models (LMs) may appear insensitive to word order changes in natural language understanding (NLU) tasks. In this paper, we propose … (see more)that linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues such as case markers provide overlapping and thus redundant information. Our hypothesis is that models exhibit insensitivity to word order when the order provides redundant information, and the degree of insensitivity varies across tasks. We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences. Our results show the effect that the less informative word order is, the more consistent the model's predictions are between unscrambled and scrambled sentences. We also find that the effect varies across tasks: for some tasks, like SST-2, LMs' prediction is almost always consistent with the original one even if the Pointwise-MI (PMI) changes, while for others, like RTE, the consistency is near random when the PMI gets lower, i.e., word order is really important.

2024-02-29

ArXiv (preprint)

The Leukemoid Reaction in Severe Alcoholic Hepatitis: A Case Report

Sachin Agrawal

Sunil Kumar

Sourya Acharya

2024-02-11

Cureus (published)

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

2024-02-08

ArXiv (preprint)

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

2024-02-08

ArXiv (preprint)

Data science opportunities of large language models for neuroscience and biomedicine

Danilo Bzdok

Andrew Thieme

Oleksiy Levkovskyy

Paul Wren

Thomas Ray