Portrait of Siva Reddy

Siva Reddy

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science and Department of Linguistics
Research Topics
Deep Learning
Natural Language Processing
Reasoning
Representation Learning

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Collaborating researcher
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - UNIVERSITÄT DES SAARLANDES
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
Research Intern - McGill University
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
Principal supervisor :
Collaborating researcher
Research Intern - McGill University
Collaborating Alumni - McGill University
Research Intern - McGill University
Collaborating researcher - Cambridge University
Collaborating Alumni - McGill University

Publications

Data science opportunities of large language models for neuroscience and biomedicine
Andrew Thieme
Oleksiy Levkovskyy
Paul Wren
Thomas Ray
Data science opportunities of large language models for neuroscience and biomedicine
Andrew Thieme
Oleksiy Levkovskyy
Paul Wren
Thomas Ray
Data science opportunities of large language models for neuroscience and biomedicine
Andrew Thieme
Oleksiy Levkovskyy
Paul Wren
Thomas Ray
Data science opportunities of large language models for neuroscience and biomedicine
Andrew Thieme
Oleksiy Levkovskyy
Paul Wren
Thomas Ray
Benchmarking Vision Language Models for Cultural Understanding
Shravan Nayak
Kanishk Jain
Rabiul Awal
Sjoerd van Steenkiste
Lisa Anne Hendricks
Karolina Sta'nczak
Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of vi… (see more)sual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing VLM's geo-diverse cultural understanding. We curate a collection of 2,378 image-question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents. The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions. Benchmarking VLMs on CulturalVQA, including GPT-4V and Gemini, reveals disparity in their level of cultural understanding across regions, with strong cultural understanding capabilities for North America while significantly lower performance for Africa. We observe disparity in their performance across cultural facets too, with clothing, rituals, and traditions seeing higher performances than food and drink. These disparities help us identify areas where VLMs lack cultural understanding and demonstrate the potential of CulturalVQA as a comprehensive evaluation set for gauging VLM progress in understanding diverse cultures.
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel
Pradeep Dasigi
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising ar… (see more)ea is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.
Scope Ambiguities in Large Language Models
Gaurav Kamath
Sebastian Schuster
Sowmya Vajjala
StarCoder: may the source be with you!
Raymond Li
Loubna Ben allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
Chenghao Mou
Marc Marone
Christopher Akiki
Jia LI
Jenny Chim
Qian Liu
Evgenii Zheltonozhskii
Terry Yue Zhuo
Thomas Wang
Olivier Dehaene
Mishig Davaadorj
Joel Lamy-Poirier
Joao Monteiro
Oleh Shliazhko
Nicolas Gontier … (see 49 more)
Nicholas Meade
Armel Zebaze
Ming-Ho Yee
Logesh Kumar Umapathi
Jian Zhu
Ben Lipkin
Muhtasham Oblokulov
Zhiruo Wang
Rudra Murthy
Jason T Stillerman
Siva Sankalp Patel
Dmitry Abulkhanov
Marco Zocca
Manan Dey
Zhihan Zhang
N. Fahmy
Urvashi Bhattacharyya
Wenhao Yu
Swayam Singh
Sasha Luccioni
Paulo Villegas
M. Kunakov
Jan Ebert
Fedor Zhdanov
Manuel Romero
Tony Lee
Nadav Timor
Jennifer Ding
Claire S Schlesinger
Hailey Schoelkopf
Jana Ebert
Tri Dao
Mayank Mishra
Alex Gu
Jennifer Robinson
Sean Hughes
Carolyn Jane Anderson
Brendan Dolan-Gavitt
Danish Contractor
Daniel Fried
Yacine Jernite
Carlos Muñoz Ferrandis
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro Von Werra
Harm de Vries
The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs)… (see more), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model
Parishad BehnamGhader
Santiago Miret
Augmenting pretrained language models with retrievers to select the supporting documents has shown promise in effectively solving common NLP… (see more) problems, including language modeling and question answering, in an interpretable way. In this paper, we first study the strengths and weaknesses of different retriever-augmented language models (REALM,
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Yang Liu
Dilek Hakkani-Tur
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram Voleti
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlik… (see more)e discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan Natesan
K. Ramamurthy
Payel Das
Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the developmen… (see more)t of Transformer-based language models. Positional encoding (PE) has been identified as a major factor influencing length generalization, but the exact impact of different PE schemes on extrapolation in downstream tasks remains unclear. In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). Our evaluation encompasses a battery of reasoning and mathematical tasks. Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks. More importantly, NoPE outperforms other explicit positional encoding methods while requiring no additional computation. We theoretically demonstrate that NoPE can represent both absolute and relative PEs, but when trained with SGD, it mostly resembles T5's relative PE attention patterns. Finally, we find that scratchpad is not always helpful to solve length generalization and its format highly impacts the model's performance. Overall, our work suggests that explicit position embeddings are not essential for decoder-only Transformers to generalize well to longer sequences.