Kian Kenyon-Dean

Saber Saberian

Maryam Fallah

Peter McLean

Jess Leung

Vasudev Sharma

Ayla Khan

Jia Balakrishnan

Safiye Celik

Dominique Beaini

Maciej Sypetkowski

Chi Vicky Cheng

Kristen Morse

Maureen Makes

Ben Mabey

Berton Earnshaw

2024-06-15

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Jingyi He

KC Tsiolis

Jackie Chi Kit Cheung

Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, se… (see more)mantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP problems. Meta-embeddings combine multiple sets of differently trained word embeddings, and have been shown to successfully improve intrinsic and extrinsic performance over equivalent models which use just one set of source embeddings. We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand. Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time. We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements in performance on all tasks.

2020-11-30

Proceedings of the 28th International Conference on Computational Linguistics (published)

Deconstructing word embedding algorithms

Edward Newell

Jackie Chi Kit Cheung

Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextual… (see more)ized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings. We believe that the theoretical findings in this paper can provide a basis for more informed development of future models.

2020-10-31

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

Deconstructing and reconstructing word embedding algorithms

Edward Daniel Newell

Jackie CK Cheung

Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applicati… (see more)ons. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and sufficient conditions required for making performant word embeddings. We find that each algorithm: (1) fits vector-covector dot products to approximate pointwise mutual information (PMI); and, (2) modulates the loss gradient to balance weak and strong signals. We demonstrate that these two algorithmic features are sufficient conditions to construct a novel word embedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalent or better performance against other algorithms across 17 intrinsic and extrinsic datasets.

2019-11-28

ArXiv (preprint)

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

Lucas Caccia

Jackie CK Cheung

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the trai… (see more)ning data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

2018-12-17

ArXiv (preprint)

Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization

Jackie CK Cheung

Doina Precup

We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation … (see more)learning. We propose a neural network architecture with novel Clustering-Oriented Regularization (CORE) terms in the objective function. These terms encourage the model to create embeddings of event mentions that are amenable to clustering. We then use agglomerative clustering on these embeddings to build event coreference chains. For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information. This work provides insight and motivating results for a new general approach to solving coreference and clustering problems with representation learning.

2018-05-31

Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics (published)

Verb Phrase Ellipsis Resolution Using Discriminative and Margin-Infused Algorithms

Jackie CK Cheung

Doina Precup

Verb Phrase Ellipsis (VPE) is an anaphoric construction in which a verb phrase has been elided. It occurs frequently in dialogue and informa… (see more)l conversational settings, but despite its evident impact on event coreference resolution and extraction, there has been relatively little work on computational methods for identifying and resolving VPE. Here, we present a novel approach to detecting and resolving VPE by using supervised discriminative machine learning techniques trained on features extracted from an automatically parsed, publicly available dataset. Our approach yields state-of-the-art results for VPE detection by improving F1 score by over 11%; additionally, we explore an approach to antecedent identiﬁ-cation that uses the Margin-Infused-Relaxed-Algorithm, which shows promising results.

2016-10-31

Conference on Empirical Methods in Natural Language Processing (published)