Shawn Tan

Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Yikeng Shen

Shawn Tan

Alessandro Sordoni

Aaron Courville

2019-05-05

(published)

www.semanticscholar.org

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (see more)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

2018-12-31

ICLR.cc/2019/Conference (oral)

openreview.net

Generating Contradictory, Neutral, and Entailing Sentences

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (see more)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.

2018-03-06

ArXiv (preprint)

arxiv.org

Inferring Identity Factors for Grouped Examples

Shawn Tan

Christopher Pal

Aaron Courville

We propose a method for modelling groups of face images from the same identity. The model is trained to infer a distribution over the latent… (see more) space for identity given a small set of “training data”. One can then sample images using that latent representation to produce images of the same identity. We demonstrate that the model extracts disentangled factors for identity factors and image-specific vectors. We also perform generative classification over identities to assess its feasibility for few-shot face recognition.

2018-02-11

(published)

openreview.net

Improving Explorability in Variational Inference with Annealed Variational Objectives

Chin-wei Huang

Shawn Tan

Alexandre Lacoste

Aaron Courville

Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can s… (see more)till limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.

2017-12-31

Advances in Neural Information Processing Systems (published)

doi.org

arxiv.org

Self-organized Hierarchical Softmax

Yikang Shen

Shawn Tan

Christopher Pal

Aaron Courville

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead … (see more)of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.

2017-07-25

ArXiv (preprint)

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Shawn Tan

Publications