Portrait de Siva Reddy

Siva Reddy

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique et Département de linguistique
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Raisonnement
Traitement du langage naturel

Biographie

Siva Reddy est professeur adjoint en informatique et linguistique à l’Université McGill. Ses travaux portent sur les algorithmes qui permettent aux ordinateurs de comprendre et de traiter les langues humaines. Il a fait ses études postdoctorales avec le Stanford NLP Group. Son expertise inclut la construction de symboliques linguistiques et induites et de modèles d’apprentissage profond pour le langage.

Étudiants actuels

Doctorat - McGill
Maîtrise recherche - McGill
Collaborateur·rice de recherche
Doctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Stagiaire de recherche - UNIVERSITÄT DES SAARLANDES
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Postdoctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill
Stagiaire de recherche - McGill
Collaborateur·rice de recherche - Cambridge University
Stagiaire de recherche - McGill

Publications

Universal Adversarial Triggers Are Not Universal
Nicholas Meade
Arkil Patel
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader
Vaibhav Adlakha
Marius Mosbach
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader
Vaibhav Adlakha
Marius Mosbach
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is… (voir plus) only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lu
Zdeněk Kasner
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (voir plus) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WebLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lu
Zdeněk Kasner
We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (voir plus) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx
A Compositional Typed Semantics for Universal Dependencies
Laurestine Bradford
Timothy John O'donnell
When does word order matter and when doesn't it?
Xuanda Chen
Timothy John O'donnell
Language models (LMs) may appear insensitive to word order changes in natural language understanding (NLU) tasks. In this paper, we propose … (voir plus)that linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues such as case markers provide overlapping and thus redundant information. Our hypothesis is that models exhibit insensitivity to word order when the order provides redundant information, and the degree of insensitivity varies across tasks. We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences. Our results show the effect that the less informative word order is, the more consistent the model's predictions are between unscrambled and scrambled sentences. We also find that the effect varies across tasks: for some tasks, like SST-2, LMs' prediction is almost always consistent with the original one even if the Pointwise-MI (PMI) changes, while for others, like RTE, the consistency is near random when the PMI gets lower, i.e., word order is really important.
The Leukemoid Reaction in Severe Alcoholic Hepatitis: A Case Report
Sachin Agrawal
Sunil Kumar
Sourya Acharya
Data science opportunities of large language models for neuroscience and biomedicine
Andrew Thieme
Oleksiy Levkovskyy
Paul Wren
Thomas Ray
Benchmarking Vision Language Models for Cultural Understanding
Shravan Nayak
Kanishk Jain
Rabiul Awal
Sjoerd van Steenkiste
Lisa Anne Hendricks
Karolina Sta'nczak
Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of vi… (voir plus)sual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing VLM's geo-diverse cultural understanding. We curate a collection of 2,378 image-question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents. The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions. Benchmarking VLMs on CulturalVQA, including GPT-4V and Gemini, reveals disparity in their level of cultural understanding across regions, with strong cultural understanding capabilities for North America while significantly lower performance for Africa. We observe disparity in their performance across cultural facets too, with clothing, rituals, and traditions seeing higher performances than food and drink. These disparities help us identify areas where VLMs lack cultural understanding and demonstrate the potential of CulturalVQA as a comprehensive evaluation set for gauging VLM progress in understanding diverse cultures.
Scope Ambiguities in Large Language Models
Gaurav Kamath
Sebastian Schuster
Sowmya Vajjala
StarCoder: may the source be with you!
Raymond Li
Loubna Ben allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
Chenghao Mou
Marc Marone
Christopher Akiki
Jia LI
Jenny Chim
Qian Liu
Evgenii Zheltonozhskii
Terry Yue Zhuo
Thomas Wang
Olivier Dehaene
Mishig Davaadorj
Joel Lamy-Poirier
Joao Monteiro
Oleh Shliazhko
Nicolas Gontier … (voir 49 de plus)
Nicholas Meade
Armel Zebaze
Ming-Ho Yee
Logesh Kumar Umapathi
Jian Zhu
Ben Lipkin
Muhtasham Oblokulov
Zhiruo Wang
Rudra Murthy
Jason T Stillerman
Siva Sankalp Patel
Dmitry Abulkhanov
Marco Zocca
Manan Dey
Zhihan Zhang
N. Fahmy
Urvashi Bhattacharyya
Wenhao Yu
Swayam Singh
Sasha Luccioni
Paulo Villegas
Jan Ebert
M. Kunakov
Fedor Zhdanov
Manuel Romero
Tony Lee
Nadav Timor
Jennifer Ding
Claire S Schlesinger
Hailey Schoelkopf
Jana Ebert
Tri Dao
Mayank Mishra
Alex Gu
Jennifer Robinson
Sean Hughes
Carolyn Jane Anderson
Brendan Dolan-Gavitt
Danish Contractor
Daniel Fried
Yacine Jernite
Carlos Muñoz Ferrandis
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs)… (voir plus), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.