Louis Clouatre

MVP: Minimal Viable Phrase for Long Text Understanding.

Louis Clouatre

Amal Zouaq

A. Chandar

2023-12-31

International Conference on Language Resources and Evaluation (published)

dblp.uni-trier.de

Local Structure Matters Most: Perturbation Study in NLU

Louis Clouatre

Prasanna Parthasarathi

Amal Zouaq

Sarath Chandar

Recent research analyzing the sensitivity of natural language understanding models to word-order perturbations has shown that neural models … (see more)are surprisingly insensitive to the order of words. In this paper, we investigate this phenomenon by developing order-altering perturbations on the order of words, subwords, and characters to analyze their effect on neural models' performance on language understanding tasks. We experiment with measuring the impact of perturbations to the local neighborhood of characters and global position of characters in the perturbed texts and observe that perturbation functions found in prior literature only affect the global ordering while the local ordering remains relatively unperturbed. We empirically show that neural models, invariant of their inductive biases, pretraining scheme, or the choice of tokenization, mostly rely on the local structure of text to build understanding and make limited use of the global structure.

2022-04-30

Findings of the Association for Computational Linguistics: ACL 2022 (published)

doi.org

arxiv.org

Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes

Louis Clouatre

Prasanna Parthasarathi

Amal Zouaq

Sarath Chandar

Providing better language tools for low-resource and endangered languages is imperative for equitable growth. Recent progress with massively… (see more) multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer to a wide variety of languages. However, this transfer is not universal, with many languages not currently understood by multilingual approaches. It is estimated that only 72 languages possess a "small set of labeled datasets" on which we could test a model's performance, the vast majority of languages not having the resources available to simply evaluate performances on. In this work, we attempt to clarify which languages do and do not currently benefit from such transfer. To that end, we develop a general approach that requires only unlabelled text to detect which languages are not well understood by a cross-lingual model. Our approach is derived from the hypothesis that if a model's understanding is insensitive to perturbations to text in a language, it is likely to have a limited understanding of that language. We construct a cross-lingual sentence similarity task to evaluate our approach empirically on 350, primarily low-resource, languages.

2021-12-31

EMNLP (Findings) (published)

doi.org

arxiv.org

Local Structure Matters Most in Most Languages

Louis Clouatre

Prasanna Parthasarathi

Amal Zouaq

A. Chandar

2021-12-31

AACL/IJCNLP (2) (published)

doi.org