Malik H. Altakrori

A Multifaceted Framework to Evaluate Evasion, Content Preservation, and Misattribution in Authorship Obfuscation Techniques

Malik H. Altakrori

Thomas Scialom

Benjamin C. M. Fung

Jackie CK Cheung

2022-11-30

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Malik H. Altakrori

Jackie CK Cheung

Benjamin C. M. Fung

2021-10-31

Findings of the Association for Computational Linguistics: EMNLP 2021 (published)

doi.org

arxiv.org

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori

Jackie CK Cheung

Benjamin C. M. Fung

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (see more)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic conﬁg-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.

2020-12-31

arXiv.org (preprint)

dblp.uni-trier.de

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Malik H. Altakrori

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Malik H. Altakrori

Publications