Portrait of Alessandro Sordoni

Alessandro Sordoni

Core Industry Member
Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research
Research Scientist, Microsoft Research Montréal
Research Topics
Large Language Models (LLM)
Natural Language Processing
Reasoning

Biography

I am a principal researcher at Microsoft Research Montréal.

For my PhD at Université de Montréal under the direction of Jian-Yun Nie, I investigated how to effectively represent documents and queries for information retrieval.

Recently, I have been motivated to study the efficiency of learning and systematic generalization in current large deep learning models. My interests span the fields of unsupervised learning and few-shot learning, especially in NLP.

Current Students

Collaborating Alumni - University of Copenhagen

Publications

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge
Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behaviora… (see more)l probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle
Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization proble… (see more)ms and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models.
Understanding by Understanding Not: Modeling Negation in Language Models
Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (see more)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.
What Makes Machine Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types
Susan Bartlett
Grzegorz Kondrak
Max Bartolo
Alastair Roberts
Johannes Welbl
Steven Bird
Ewan Klein
Edward Loper
Samuel R. Bowman
George Dahl. 2021
What
Chao Pang
Junyuan Shang
Jiaxiang Liu
Xuyi Chen
Yanbin Zhao
Yuxiang Lu
Weixin Liu
Zhi-901 hua Wu
Weibao Gong … (see 21 more)
Jianzhong Liang
Zhizhou Shang
Peng Sun
Ouyang Xuan
Dianhai
Hao Tian
Hua Wu
Haifeng Wang
Adam Trischler
Tong Wang
Xingdi Yuan
Justin Har-908
Philip Bachman
Adina Williams
Nikita Nangia
Zhilin Yang
Peng Qi
Saizheng Zhang
ing. In
For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and diffi… (see more)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difficulty and question 012 types of the collected examples. To our sur-013 prise, we find that passage source, length, and 014 readability measures do not significantly affect 015 question difficulty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difficulty need not be a priority. 025
Recursive Top-Down Production for Sentence Generation with Latent Trees
Explicitly Modeling Syntax in Language Model improves Generalization
Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (see more)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
Ordered Memory
Yikang Shen
Shawn Tan
Seyedarian Hosseini
Zhouhan Lin
Ordered Memory
Yikang Shen
Shawn Tan
Seyedarian Hosseini
Zhouhan Lin
Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikeng Shen
Shawn Tan
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva*
Remi Tachet des Combes
Adam Trischler
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single class… (see more)ification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a “forgetting event” to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set’s (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.
Ordered Memory
Yikang Shen
Shawn Tan
Arian Hosseini
Zhouhan Lin
Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficult… (see more)y of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015) and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford Sentiment Treebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikang Shen
Shawn Tan
Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (see more)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.