Portrait of Siva Reddy

Siva Reddy

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science and Department of Linguistics
Research Topics
Deep Learning
Natural Language Processing
Reasoning
Representation Learning

Biography

Siva Reddy is an assistant professor at the School of Computer Science and in the Department of Linguistics at McGill University. He completed a postdoc with the Stanford NLP Group in September 2019.

Reddy’s research goal is to enable machines with natural language understanding abilities in order to facilitate applications like question answering and conversational systems. His expertise includes building symbolic (linguistic and induced) and deep learning models for language.

Current Students

PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Collaborating researcher
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Research Intern - UNIVERSITÄT DES SAARLANDES
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
PhD - McGill University
Principal supervisor :
Research Intern - McGill University
Postdoctorate - McGill University
Research Intern - McGill University
Collaborating researcher - Cambridge University
Research Intern - McGill University

Publications

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining
Andreas Madsen
Nicholas Meade
Vaibhav Adlakha
To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (see more)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.
Compositional Generalization in Dependency Parsing
Compositionality— the ability to combine familiar units like words into novel phrases and sentences— has been the focus of intense inter… (see more)est in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behaviour of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser’s lower performance on the most challenging splits.
Minimax and Neyman–Pearson Meta-Learning for Outlier Languages
Edoardo Ponti
Rahul Aralikatte
Disha Shrivastava
Anders Sogaard
Model-agnostic meta-learning (MAML) has been recently put forth as a strategy to learn resource-poor languages in a sample-efficient fashion… (see more). Nevertheless, the properties of these languages are often not well represented by those available during training. Hence, we argue that the i.i.d. assumption ingrained in MAML makes it ill-suited for cross-lingual NLP. In fact, under a decision-theoretic framework, MAML can be interpreted as minimising the expected risk across training languages (with a uniform prior), which is known as Bayes criterion. To increase its robustness to outlier languages, we create two variants of MAML based on alternative criteria: Minimax MAML reduces the maximum risk across languages, while Neyman–Pearson MAML constrains the risk in each language to a maximum threshold. Both criteria constitute fully differentiable two-player games. In light of this, we propose a new adaptive optimiser solving for a local approximation to their Nash equilibrium. We evaluate both model variants on two popular NLP tasks, part-of-speech tagging and question answering. We report gains for their average and minimum performance across low-resource languages in zeroand few-shot settings, compared to joint multisource transfer and vanilla MAML. The code for our experiments is available at https:// github.com/rahular/robust-maml.
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem
Anna Bethke
A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or African Americans are athlet… (see more)ic. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real-world data, they are known to capture stereotypical biases. It is important to quantify to what extent these biases are present in them. Although this is a rapidly growing area of research, existing literature lacks in two important aspects: 1) they mainly evaluate bias of pretrained language models on a small set of artificial sentences, even though these models are trained on natural data 2) current evaluations focus on measuring bias without considering the language modeling ability of a model, which could lead to misleading trust on a model even if it is a poor language model. We address both these problems. We present StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion. We contrast both stereotypical bias and language modeling ability of popular models like BERT, GPT-2, RoBERTa, and XLnet. We show that these models exhibit strong stereotypical biases. Our data and code are available at https://stereoset.mit.edu.
Modelling Latent Translations for Cross-Lingual Transfer
Edoardo Ponti
Julia Kreutzer
Ivan Vulic
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle
Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization proble… (see more)ms and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models.
Understanding by Understanding Not: Modeling Negation in Language Models
Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (see more)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.
Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Meiqi Guo
Mingda Zhang
Malihe Alikhani
Effective communication is about the dissemination of properly worded meaningful ideas/messages that are comprehensible to both sen… (see more)der and receiver and which ultimately can attract the desired response or feedback. For machines to engage in a conversation, it is therefore essential to enable them to clarify ambiguity and achieve a common ground. We introduce Abg-CoQA, a novel dataset for clarifying ambiguity in Conversational Question Answering systems. Our dataset contains 9k questions with answers where 1k questions are ambiguous, obtained from 4k text passages from five diverse domains. For ambiguous questions, a clarification conversational turn is collected. We evaluate strong language generation models and conversational question answering models on Abg-CoQA. The best-performing system achieves a BLEU-1 score of 12.9% on generating clarification question, which is 27.9 points behind human performance (40.8%); and a F1 score of 40.1% on question answering after clarification, which is 35.1 points behind human performance (75.2%), indicating there is ample room for improvement.
MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining
Zhi Wen
Xing Han Lu
Explicitly Modeling Syntax in Language Model improves Generalization
Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (see more)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
Words Aren’t Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Visual referring expression recognition is a challenging task that requires natural language understanding in the context of an image. We cr… (see more)itically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to identify the target object, the word order doesn’t matter. To measure the true progress of existing models, we split the test set into two sets, one which requires reasoning on linguistic structure and the other which doesn’t. Additionally, we create an out-of-distribution dataset Ref-Adv by asking crowdworkers to perturb in-domain examples such that the target object changes. Using these datasets, we empirically show that existing methods fail to exploit linguistic structure and are 12% to 23% lower in performance than the established progress for this task. We also propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT, the current state-of-the-art model for this task. Our datasets are publicly available at https://github.com/aws/aws-refcocog-adv.
Measuring Systematic Generalization in Neural Proof Generation with Transformers
Nicolas Gontier
Koustuv Sinha
We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded… (see more) in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.