Portrait of Jessica Ojo

Jessica Ojo

Master's Research - McGill University
Research Topics
Large Language Models (LLM)
Linguistic Evaluation of Language Models
Machine Learning For Speech and Audio
Machine Translation
Scaling Engineering Infrastructure for Large Models Training

Publications

MasakhaNEWS: News Topic Classification for African languages
Marek Masiak
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Atnafu Lambebo Tonja
Christine Mwase
Odunayo Ogundepo
Bonaventure F. P. Dossou
Akintunde Oladipo
Doreen Nixdorf
sana Sabah al-azzawi
Blessing Kudzaishe Sibanda
Davis David
Lolwethu Ndolela
Jonathan Mukiibi
Tunde Oluwaseyi Ajayi
Tatiana Moteu Ngoli
Brian Odhiambo
Abraham Toluwase Owodunni … (see 42 more)
Nnaemeka Casmir Obiefuna
Shamsuddeen Hassan Muhammad
Saheed Salahudeen Abdullahi
Mesay Gemeda Yigezu
Tajuddeen Gwadabe
Idris Abdulmumin
Mahlet Taye Bame
Oluwabusayo Olufunke Awoyomi
Iyanuoluwa Shode
Tolulope Anu Adelani
Habiba Abdulganiy Kailani
Abdul-Hakeem Omotayo
Adetola Adeeko
Afolabi Abeeb
Aremu Anuoluwapo
Olanrewaju Samuel
Clemencia Siro
Wangari Kimotho
Onyekachi Ogbu
CHINEDU EMMANUEL MBONU
Chiamaka Ijeoma Chukwuneke
Samuel Fanijo
Oyinkansola Fiyinfoluwa Awosan
Tadesse Kebede Guge
Toadoum Sari Sakayo
Pamela Nyatsine
Freedmore Sidume
Oreen Yousuf
Mardiyyah Oduwole
USSEN ABRE KIMANUKA
Kanda Patrick Tshinu
Thina Diko
Siyanda Nxakama
Abdulmejid Tuni Johar
Sinodos Gebre
Muhidin A. Mohamed
Shafie Abdi Mohamed
Fuad Mire Hassan
Moges Ahmed Mehamed
Evrard Ngabire
Pontus Stenetorp
African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individ… (see more)ual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.