Yao Lu

Maurice Weber

Max Ryabinin

David Ifeoluwa Adelani

Yihong Chen

Raphael Tang

Pontus Stenetorp

2025-02-18

ArXiv (preprint)

Multilingual Language Model Pretraining using Machine-translated Data

Jiayi Wang

Maurice Weber

Max Ryabinin

David Ifeoluwa Adelani

Yihong Chen

Raphael Tang

Pontus Stenetorp

2025-02-18

ArXiv (preprint)

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Jiayi Wang

David Ifeoluwa Adelani

Sweta Agrawal

Marek Masiak

Ricardo Rei

Eleftheria Briakou

Marine Carpuat

Xuanli He

Sofia Bourhim

Andiswa Bukula

Muhidin A. Mohamed

Temitayo Olatoye

Tosin Adewumi

Hamam Mokayed

Christine Mwase

Wangui Kimotho

Foutse Yuehgoh

Aremu Anuoluwapo

Jessica Ojo

Shamsuddeen Hassan Muhammad … (see 41 more)

Salomey Osei

Abdul-Hakeem Omotayo

Chiamaka Ijeoma Chukwuneke

Perez Ogayo

Oumaima Hourrane

Salma El Anigri

Lolwethu Ndolela

Thabiso Mangwana

Shafie Abdi Mohamed

Hassan Ayinde

Ayinde Hassan

Oluwabusayo Olufunke Awoyomi

Lama Alkhaled

sana Sabah al-azzawi

Naome Etori

Millicent Ochieng

Clemencia Siro

Samuel Njoroge

Njoroge Kiragu

Eric Muchiri

Wangari Kimotho

Lyse Naomi Wamba

Daud Abolade

Simbiat Ajao

Iyanuoluwa Shode

Ricky Macharm

Ruqayya Nasir Iro

Saheed Salahudeen Abdullahi

Stephen Moore

Bernard Opoku

Zainab Akinjobi

Abeeb Afolabi

Nnaemeka Casmir Obiefuna

Onyekachi Ogbu

Sam Brian

Sam Ochieng’

Verrah Akinyi Otiende

CHINEDU EMMANUEL MBONU

Toadoum Sari Sakayo

Pontus Stenetorp

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measur… (see more)ing this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

2024-06-01

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (published)

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Yue Dong

Laurent Charlin

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-sc… (see more)ale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results—using several state-of-the-art models trained on the Multi-XScience dataset—reveal that Multi-XScience is well suited for abstractive models.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)