Alessandro Sordoni

Membre industriel principal

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chercheur scientifique, Microsoft Research Montréal

Sujets de recherche

Grands modèles de langage (LLM)

Raisonnement

Traitement du langage naturel

Biographie

Je suis chercheur principal à Microsoft Research Montréal. J'ai obtenu un doctorat de l'Université de Montréal sous la direction de Jian-Yun Nie, en étudiant comment représenter efficacement les documents et les requêtes pour la recherche d'information. Présentement, je m’intéresse à l'étude de l'efficacité de l'apprentissage et de la généralisation systématique dans les grands modèles actuels d'apprentissage profond. Mes intérêts s'étendent à l'apprentissage non supervisé et à l'apprentissage à petite échelle, en particulier dans le domaine du langage naturel.

Étudiants actuels

Zhan Su

Collaborateur·rice alumni - University of Copenhagen

Publications

Ordered Memory

Seyedarian Hosseini

2019-10-29

ArXiv (prépublication)

Ordered Memory

Seyedarian Hosseini

2019-10-29

ArXiv (prépublication)

arxiv.org

Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Yikeng Shen

Shawn Tan

Alessandro Sordoni

Aaron Courville

An Empirical Study of Example Forgetting during Deep Neural Network Learning

Mariya Toneva*

Alessandro Sordoni

Remi Tachet des Combes

Adam Trischler

Yoshua Bengio

Geoff Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single class… (voir plus)ification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a “forgetting event” to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set’s (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.

2019-01-01

ICLR.cc/2019/Conference (poster)

Ordered Memory

Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficult… (voir plus)y of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015) and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford Sentiment Treebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (voir plus)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

2019-01-01

ICLR.cc/2019/Conference (présentation orale)

openreview.net

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

Amjad Almahairi

Sai Rajeswar

Alessandro Sordoni

Philip Bachman

Aaron Courville

Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by red… (voir plus)ucing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.

2018-07-03

Proceedings of the 35th International Conference on Machine Learning (publié)

proceedings.mlr.press

arxiv.org

Focused Hierarchical RNNs for Conditional Sequence Processing

Nan Rosemary Ke

Adam Trischler

Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most o… (voir plus)f these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.

2018-07-03

Proceedings of the 35th International Conference on Machine Learning (publié)

proceedings.mlr.press

arxiv.org

Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences

Athul Jacob

Zhouhan Lin

Alessandro Sordoni

Yoshua Bengio

We propose a hierarchical model for sequential data that learns a tree on-the-fly, i.e. while reading the sequence. In the model, a recurren… (voir plus)t network adapts its structure and reuses recurrent weights in a recursive manner. This creates adaptive skip-connections that ease the learning of long-term dependencies. The tree structure can either be inferred without supervision through reinforcement learning, or learned in a supervised manner. We provide preliminary experiments in a novel Math Expression Evaluation (MEE) task, which is created to have a hierarchical tree structure that can be used to study the effectiveness of our model. Additionally, we test our model in a well-known propositional logic and language modelling tasks. Experimental results have shown the potential of our approach.

2018-07-01

Rep4NLP@ACL (publié)

doi.org

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

Athul Jacob

In this work, we propose a novel constituency parsing scheme. The model first predicts a real-valued scalar, named syntactic distance, for e… (voir plus)ach split position in the sentence. The topology of grammar tree is then determined by the values of syntactic distances. Compared to traditional shift-reduce parsing schemes, our approach is free from the potentially disastrous compounding error. It is also easier to parallelize and much faster. Our model achieves the state-of-the-art single model F1 score of 92.1 on PTB and 86.4 on CTB dataset, which surpasses the previous single model results by a large margin.

2018-07-01

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Towards Text Generation with Adversarially Learned Neural Outlines

Sai Rajeswar

Adam Trischler

Recent progress in deep generative models has been fueled by two paradigms -- autoregressive and adversarial models. We propose a combinatio… (voir plus)n of both approaches with the goal of learning generative models of text. Our method first produces a high-level sentence outline and then generates words sequentially, conditioning on both the outline and the previous outputs. We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders. This provides strong, informative conditioning for the autoregressive stage. Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations.

Twin Networks: Matching the Future for Sequence Generation

Nan Rosemary Ke

Adam Trischler

We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given seq… (voir plus)uence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). We show empirically that our approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.

2018-01-01

ICLR (Poster) (publié)

openreview.net

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Alessandro Sordoni

Biographie

Étudiants actuels

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Alessandro Sordoni

Biographie

Étudiants actuels

Publications