David Ifeoluwa Adelani

Alham Fikri Aji

Felermino Ali

Vladimir Araujo

Abinew Ayele

Oana Ignat

Alexander Panchenko

Yi Zhou … (see 1 more)

Saif M. Mohammad

2025-03-01

arXiv (published)

Multilingual Language Model Pretraining using Machine-translated Data

Jiayi Wang

Yao Lu

Maurice Weber

Max Ryabinin

Yihong Chen

Raphael Tang

Pontus Stenetorp

2025-02-18

ArXiv (preprint)

Multilingual Language Model Pretraining using Machine-translated Data

Jiayi Wang

Yao Lu

Maurice Weber

Max Ryabinin

Yihong Chen

Raphael Tang

Pontus Stenetorp

2025-02-18

ArXiv (preprint)

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad

Nedjma OUSIDHOUM

Idris Abdulmumin

Jan Philip Wahle

Terry Lima Ruas

Meriem Beloucif

Christine de Kock

Nirmal Surange

Daniela Teodorescu

Ibrahim Ahmad

Alham Fikri Aji

Felermino Ali

Ilseyar Alimova

Vladimir Araujo

Nikolay Babakov

Naomi Baes

Ana-Maria Bucur

Andiswa Bukula

Guanqun Cao … (see 28 more)

Rodrigo Tufino Cardenas

Rendi Chevi

Chiamaka Ijeoma Chukwuneke

Alexandra Ciobotaru

Daryna Dementieva

Murja Sani Gadanya

Robert Geislinger

Bela Gipp

Oumaima Hourrane

Oana Ignat

Falalu Lawan

Rooweither Mabuya

Rahmad Mahendra

Vukosi Marivate

Andrew Piper

Alexander Panchenko

Charles Henrique Porto Ferreira

Vitaly Protasov

Samuel Rutunda

Manish Shrivastava

Aura Cristina Udrea

Lilian D. A. Wanzare

Sophie Wu

Florian Valentin Wunderlich

Hanif Muhammad Zhafran

Tianhui Zhang

Yi Zhou

Saif M. Mohammad

People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition--an umbrella term for several NLP… (see more) tasks--impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which often lack high-quality annotated datasets. In this paper, we present BRIGHTER--a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. BRIGHTER primarily covers low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances labeled by fluent speakers. We highlight the challenges related to the data collection and annotation processes, and then report experimental results for monolingual and crosslingual multi-label emotion identification, as well as emotion intensity recognition. We analyse the variability in performance across languages and text domains, both with and without the use of LLMs, and show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.

2025-02-17

ArXiv (preprint)

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad

Nedjma OUSIDHOUM

Idris Abdulmumin

Jan Philip Wahle

Terry Lima Ruas

Meriem Beloucif

Christine de Kock

Nirmal Surange

Daniela Teodorescu

Ibrahim Ahmad

Alham Fikri Aji

Felermino Ali

Ilseyar Alimova

Vladimir Araujo

Nikolay Babakov

Naomi Baes

Ana-Maria Bucur

Andiswa Bukula

Guanqun Cao … (see 28 more)

Rodrigo Tufino Cardenas

Rendi Chevi

Chiamaka Ijeoma Chukwuneke

Alexandra Ciobotaru

Daryna Dementieva

Murja Sani Gadanya

Robert Geislinger

Bela Gipp

Oumaima Hourrane

Oana Ignat

Falalu Lawan

Rooweither Mabuya

Rahmad Mahendra

Vukosi Marivate

Andrew Piper

Alexander Panchenko

Charles Henrique Porto Ferreira

Vitaly Protasov

Samuel Rutunda

Manish Shrivastava

Aura Cristina Udrea

Lilian D. A. Wanzare

Sophie Wu

Florian Valentin Wunderlich

Hanif Muhammad Zhafran

Tianhui Zhang

Yi Zhou

Saif M. Mohammad

2025-02-17

ArXiv (preprint)

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Shamsuddeen Hassan Muhammad

Nedjma OUSIDHOUM

Idris Abdulmumin

Jan Philip Wahle

Terry Lima Ruas

Meriem Beloucif

Christine de Kock

Nirmal Surange

Daniela Teodorescu

Ibrahim Ahmad

Alham Fikri Aji

Felermino Ali

Ilseyar Alimova

Vladimir Araujo

Nikolay Babakov

Naomi Baes

Ana-Maria Bucur

Andiswa Bukula

Guanqun Cao … (see 28 more)

Rodrigo Tufino Cardenas

Rendi Chevi

Chiamaka Ijeoma Chukwuneke

Alexandra Ciobotaru

Daryna Dementieva

Murja Sani Gadanya

Robert Geislinger

Bela Gipp

Oumaima Hourrane

Oana Ignat

Falalu Lawan

Rooweither Mabuya

Rahmad Mahendra

Vukosi Marivate

Andrew Piper

Alexander Panchenko

Charles Henrique Porto Ferreira

Vitaly Protasov

Samuel Rutunda

Manish Shrivastava

Aura Cristina Udrea

Lilian D. A. Wanzare

Sophie Wu

Florian Valentin Wunderlich

Hanif Muhammad Zhafran

Tianhui Zhang

Yi Zhou

Saif M. Mohammad

2025-02-17

ArXiv (preprint)

Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation

Senyu Li

Zipeng Sun

Jiayi Wang

Xue (Steve) Liu

Pontus Stenetorp

Siva Reddy

2025-02-17

ArXiv (preprint)

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu

Jesujoba Oluwadara Alabi

Andiswa Bukula

Zhuang Yun Jian

En-Shiun Annie Lee

Tadesse Kebede Guge

Israel Abebe Azime

Happy Buzaaba

Blessing Kudzaishe Sibanda

Godson Kalipe

Jonathan Mukiibi

S. Kabenamualu

M. Setaka

Lolwethu Ndolela

Nkiruka Bridget Odu

Rooweither Mabuya

Shamsuddeen Hassan Muhammad

Salomey Osei

Sokhar Samb

Juliet W. Murage … (see 2 more)

Dietrich Klakow

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks o… (see more)ften exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

2025-02-13

ArXiv (preprint)

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu

Jesujoba Oluwadara Alabi

Andiswa Bukula

Zhuang Yun Jian

En-Shiun Annie Lee

Tadesse Kebede Guge

Israel Abebe Azime

Happy Buzaaba

Blessing Kudzaishe Sibanda

Godson Kalipe

Jonathan Mukiibi

S. Kabenamualu

M. Setaka

Lolwethu Ndolela

Nkiruka Bridget Odu

Rooweither Mabuya

Shamsuddeen Hassan Muhammad

Salomey Osei

Sokhar Samb

Juliet W. Murage … (see 2 more)

Dietrich Klakow

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks o… (see more)ften exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

2025-02-13

ArXiv (preprint)

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu

Jesujoba Oluwadara Alabi

Andiswa Bukula

Zhuang Yun Jian

En-Shiun Annie Lee

Tadesse Kebede Guge

Israel Abebe Azime

Happy Buzaaba

Blessing Kudzaishe Sibanda

Godson Kalipe

Jonathan Mukiibi

S. Kabenamualu

M. Setaka

Lolwethu Ndolela

Nkiruka Bridget Odu

Rooweither Mabuya

Shamsuddeen Hassan Muhammad

Salomey Osei

Sokhar Samb

Juliet W. Murage … (see 2 more)

Dietrich Klakow

2025-02-13

ArXiv (preprint)

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Shamsuddeen Hassan Muhammad

Idris Abdulmumin

Abinew Ayele

Ibrahim Ahmad

Saminu Mohammad Aliyu

Nelson Odhiambo Onyango

Lilian D. A. Wanzare

Samuel Rutunda

Lukman Jibril Aliyu

Esubalew Alemneh

Oumaima Hourrane

Hagos Gebremichael

Elyas Abdi Ismail

Meriem Beloucif

Ebrahim Chekol Jibril

Andiswa Bukula

Rooweither Mabuya

Salomey Osei

Abigail Oppong … (see 7 more)

Tadesse Belay

Tadesse Kebede Guge

Tesfa Tegegne Asfaw

Chiamaka Ijeoma Chukwuneke

Paul Rottger

Seid Muhie Yimam

Nedjma OUSIDHOUM

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (see more)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

2025-01-14

ArXiv (preprint)

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Shamsuddeen Hassan Muhammad

Idris Abdulmumin

Abinew Ayele

Ibrahim Ahmad

Saminu Mohammad Aliyu

Nelson Odhiambo Onyango

Lilian D. A. Wanzare

Samuel Rutunda

Lukman Jibril Aliyu

Esubalew Alemneh

Oumaima Hourrane

Hagos Gebremichael

Elyas Abdi Ismail

Meriem Beloucif

Ebrahim Chekol Jibril

Andiswa Bukula

Rooweither Mabuya

Salomey Osei

Abigail Oppong … (see 7 more)

Tadesse Belay

Tadesse Kebede Guge

Tesfa Tegegne Asfaw

Chiamaka Ijeoma Chukwuneke

Paul Rottger

Seid Muhie Yimam

Nedjma OUSIDHOUM

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (see more)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

2025-01-14

ArXiv (preprint)