Siva Reddy

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

Vaibhav Adlakha

PhD - McGill University

Parishad BehnamGhader

Master's Research - McGill University

PhD - McGill University

Matteo Boglioni

Collaborating researcher - McGill University

Verna Dankers

Postdoctorate - University of Edinburgh

Jiaqi Deng

Collaborating researcher

Charbel El Feghali

Research Intern - McGill University

Desmond Elliott

Independent visiting researcher

Co-supervisor :

Yoshua Bengio

Jay Gala

Master's Research - McGill University

Co-supervisor :

Collaborating researcher

Collaborating Alumni

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Imene Kerboua

Collaborating researcher - INSA Lyon, France

PhD - McGill University

Principal supervisor :

Golnoosh Farnadi

Austin Kraft

PhD - McGill University

Co-supervisor :

Timothy O'Donnell

Benno Krojer

PhD - McGill University

Zichao Li

PhD - McGill University

Co-supervisor :

Jackie Cheung

Fengyuan Liu

Master's Research - McGill University

Co-supervisor :

Dzmitry Bahdanau

Xing Han Lu

PhD - McGill University

Master's Research - McGill University

Nicholas Meade

PhD - McGill University

Postdoctorate - McGill University

Marzia Nouri

Master's Research - McGill University

Arkil Patel

PhD - McGill University

Principal supervisor :

Collaborating researcher - N/A

Ben Saine

Research Intern - McGill University

Collaborating Alumni

Karolina Ewa Stańczak

Collaborating Alumni - McGill University

Ivan Titov

Collaborating researcher

Co-supervisor :

Yoshua Bengio

Ada Tur

Research Intern - McGill University

PhD - McGill University

Collaborating Alumni - McGill University

Donghao Zeng

Research Intern - McGill University

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Blog Posts

October 1, 2024

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

Nouha Dziri

Sivan Milton

Mo Yu

Osmar R Zaiane

Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallu… (see more)cination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of > 60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations. Our findings raise important questions on the quality of existing datasets and models trained using them. We make our annotations publicly available for future research.

2022-04-17

ArXiv (preprint)

TopiOCQA: Open-domain Conversational Question Answering with Topic Switching

Vaibhav Adlakha

Shehzaad Dhuliawala

Kaheer Suleman

Harm de Vries

2022-04-13

Transactions of the Association for Computational Linguistics (published)

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we a… (see more)sk the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers.

2022-04-06

ArXiv (preprint)

Image Retrieval from Contextual Descriptions

Vibhav Vineet

Edoardo Ponti

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utte… (see more)rance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.As such, each description contains only the details that help distinguish between images.Because of this, descriptions tend to be complex in terms of syntax and discourse and require drawing pragmatic inferences. Images are sourced from both static pictures and video frames.We benchmark several state-of-the-art models, including both cross-encoders such as ViLBERT and bi-encoders such as CLIP, on ImageCoDe.Our results reveal that these models dramatically lag behind human performance: the best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans.Furthermore, we experiment with new model variants that are better equipped to incorporate visual and temporal context into their representations, which achieve modest gains. Our hope is that ImageCoDE will foster progress in grounded language understanding by encouraging models to focus on fine-grained visual differences.

2022-03-29

ArXiv (preprint)

Combining Modular Skills in Multitask Learning

Edoardo Ponti

Alessandro Sordoni

2022-02-28

ArXiv (preprint)

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Emanuele Bugliarello

Fangyu Liu

Jonas Pfeiffer

Desmond Elliott

Edoardo Ponti

Ivan Vulić

Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of… (see more) a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.

2022-01-01

ICML (published)

proceedings.mlr.press

The Curious Case of Absolute Position Embeddings

Koustuv Sinha

Amirhossein Kazemnejad

Joelle Pineau

Dieuwke Hupkes

Adina Williams

2022-01-01

EMNLP (Findings) (published)

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh Sachan

William L. Hamilton

Chris Dyer

Dani Yogatama

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine informat… (see more)ion from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

openreview.net

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Devang Kulshreshtha

Robert Belfer

Iulian V. Serban

In this work, we introduce back-training, an alternative to self-training for unsupervised domain adaptation (UDA). While self-training gene… (see more)rates synthetic training data where natural inputs are aligned with noisy outputs, back-training results in natural outputs aligned with noisy inputs. This significantly reduces the gap between target domain and synthetic data distribution, and reduces model overfitting to source domain. We run UDA experiments on question generation and passage retrieval from the Natural Questions domain to machine learning and biomedical domains. We find that back-training vastly outperforms self-training by a mean improvement of 7.8 BLEU-4 points on generation, and 17.6% top-20 retrieval accuracy across both domains. We further propose consistency filters to remove low-quality synthetic data before training. We also release a new domain-adaptation dataset - MLQuestions containing 35K unaligned questions, 50K unaligned passages, and 3K aligned question-passage pairs.

2021-11-01

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (published)

Visually Grounded Reasoning across Languages and Cultures

Fangyu Liu

Emanuele Bugliarello

Edoardo Ponti

Nigel Collier

Desmond Elliott

The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and … (see more)images of ImageNet. While one can hardly overestimate how much this benchmark contributed to progress in computer vision, it is mostly derived from lexical databases and image queries in English, resulting in source material with a North American or Western European bias. Therefore, we devise a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. In particular, we let the selection of both concepts and images be entirely driven by native speakers, rather than scraping them automatically. Specifically, we focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. On top of the concepts and images obtained through this new protocol, we create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images. The task consists of discriminating whether each grounded statement is true or false. We establish a series of baselines using state-of-the-art models and find that their cross-lingual transfer performance lags dramatically behind supervised performance in English. These results invite us to reassess the robustness and accuracy of current state-of-the-art models beyond a narrow domain, but also open up new exciting challenges for the development of truly multilingual and multicultural systems.

2021-11-01

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (published)

openreview.net

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Nicholas Meade

Elinor Poole-Dayan

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attract… (see more)ed attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model’s language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are often accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation was effective.

2021-10-16

ArXiv (preprint)

The Power of Prompt Tuning for Low-Resource Semantic Parsing

Nathan Schucher

Harm de Vries

Prompt tuning has recently emerged as an effective method for adapting pre-trained language models to a number of language understanding and… (see more) generation tasks. In this paper, we investigate prompt tuning for semantic parsing—the task of mapping natural language utterances onto formal meaning representations. On the low-resource splits of Overnight and TOPv2, we find that a prompt tuned T5-xl significantly outperforms its fine-tuned counterpart, as well as strong GPT-3 and BART baselines. We also conduct ablation studies across different model scales and target representations, finding that, with increasing model scale, prompt tuned T5 models improve at generating target representations that are far from the pre-training distribution.

2021-10-16

ArXiv (preprint)