Siva Reddy

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

Vaibhav Adlakha

PhD - McGill University

Master's Research - McGill University

PhD - McGill University

Matteo Boglioni

Collaborating researcher - McGill University

Joyce Chai

Independent visiting researcher

Co-supervisor :

Yoshua Bengio

Verna Dankers

Postdoctorate - University of Edinburgh

Jiaqi Deng

Collaborating researcher

Charbel El Feghali

Research Intern - McGill University

Desmond Elliott

Independent visiting researcher

Co-supervisor :

Yoshua Bengio

Jay Gala

Master's Research - McGill University

Co-supervisor :

Collaborating researcher

Collaborating Alumni

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Imene Kerboua

Collaborating researcher - INSA Lyon, France

PhD - McGill University

Principal supervisor :

Golnoosh Farnadi

Austin Kraft

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Aravind Krishnan

Collaborating Alumni - UNIVERSITÄT DES SAARLANDES

Benno Krojer

PhD - McGill University

Zichao Li

PhD - McGill University

Co-supervisor :

Jackie Cheung

Fengyuan Liu

Master's Research - McGill University

Co-supervisor :

Dzmitry Bahdanau

Xing Han Lu

PhD - McGill University

Master's Research - McGill University

Nicholas Meade

PhD - McGill University

Postdoctorate - McGill University

Abhik Roychoudhury Roychoudhury

Arkil Patel

PhD - McGill University

Principal supervisor :

Collaborating researcher - N/A

Independent visiting researcher

Co-supervisor :

Collaborating Alumni

Karolina Ewa Stańczak

Collaborating Alumni - McGill University

Ivan Titov

Independent visiting researcher

Co-supervisor :

Yoshua Bengio

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Ada Tur

Research Intern - McGill University

PhD - McGill University

Collaborating Alumni - McGill University

Blog Posts

October 1, 2024

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (see more) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data (as of May 24, 2024). Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (see more) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024-04-09

ArXiv (preprint)

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2024-04-09

ArXiv (preprint)

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Abstract Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope … (see more)ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models—GPT-2, GPT-3/3.5, Llama 2, and GPT-4—treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).1

2024-04-05

ArXiv (preprint)

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

2024-04-05

ArXiv (preprint)

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WebLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx.

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

A Compositional Typed Semantics for Universal Dependencies

Laurestine Bradford

Timothy John O'donnell

2024-03-02

ArXiv (preprint)