Portrait of David Ifeoluwa Adelani

David Ifeoluwa Adelani

Core Academic Member
Canada CIFAR AI Chair
McGill University
Research Topics
Deep Learning
Natural Language Processing
Representation Learning

Biography

David Adelani is an assistant professor at McGill University’s School of Computer Science under the Fighting Inequities initiative, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Adelani’s research focuses on multilingual natural language processing with special attention to under-resourced languages.

Current Students

PhD - McGill University
Master's Research - McGill University
Research Intern - McGill University
Master's Research - McGill University

Publications

Enhancing Transformer Models for Igbo Language Processing: A Critical Comparative Study
Anthony Soronnadi
Olubayo Adekanmbi
Chinazo Anebelundu
NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages
Aremu Anuoluwapo
Jesujoba Oluwadara Alabi
Daud Abolade
Nkechinyere Faith Aguobi
Shamsuddeen Hassan Muhammad
In this paper, we create NaijaRC— a new multi-choice Nigerian Reading Comprehension dataset that is based on high-school RC examination fo… (see more)r three Nigerian national languages: Hausa (hau), Igbo (ibo), and \yoruba (yor). We provide baseline results by performing cross-lingual transfer using the Belebele training data which is majorly from RACE {RACE is based on English exams for middle and high school Chinese students, very similar to our dataset.} dataset based on several pre-trained encoder-only models. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.
YAD: Leveraging T5 for improved automatic diacritization of Yorùbá text
Akindele Michael Olawole
Jesujoba Oluwadara Alabi
Aderonke Busayo Sakpere
In this work we present Yorùbá automatic diacritization (YAD) benchmark dataset for evaluating Yorùbá diacritization systems. In additio… (see more)n, we pre-train text-to-text transformer, T5 model for Yorùbá and showed that this model outperform several multilingually trained T5 models. Lastly, we showed that more data and bigger models are better at diacritization for Yorùbá
Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, June 16-21, 2024
Mohamed Abdalla
Gavin Abercrombie
Rodrigo Agerri
Zeljko Agic
Eneko Agirre
Monica Agrawal
Wasi Uddin Ahmad
James Allan
Aijun An
Antonios Anasta-sopoulos
Mark Anderson
Jacob Andreas
Marianna Apidianaki
Alessio Palmero
Yuki Aprosio
Ehsaneddin Arase
Giuseppe Asgari
Wilker Attardi
Aziz JinYeong … (see 480 more)
Timothy Bak
Mohamad Hardyman Baldwin
Pierpaolo Barawi
Ali Basile
Ja-smijn Basirat
Timo Bastings
Gábor Baumann
Eyal Bella
Farah Ben-David
Luciana Benamara
Benotti Yevgeni
Brijesh Berzak
Federico Bhatt
Chris Bianchi
Lidong Biemann
Alexandra Bing
Birch Eduardo
Gemma Blanco
Aurélien Boleda
Florian Bossard
Leonid Boudin
Ronan Boytsov
Pavel Le Bras
Chris Braslavski
Eleftheria Brew
Thomas Briakou
Emanuele Brochhagen
Wray Buglia-rello
Buntine Elena
Aoife Cabrio
Ruken Cahill
Jose Cakici
Marie Camacho-Collados
Pengfei Candito
Ziqiang Cao
Dallas Cao
Paula Card
Tommaso Carvalho
Andrew Caselli
Tanmoy Cattle
Ilias Chakrabor-ty
Angel X Chalkidis
Ching-Yun Chang
Snigdha Chang
Chen Chaturvedi
Kehai Chen
Long Chen
Lu Chen
Muhao Chen
Wei Chen
Wenhu Chen
Wenliang Chen
Xiang Chen
Yidong Chen
Yun-Nung Chen
Zhiyu Chen
Zhuang Chen
Hao Chen
Yu Cheng
Colin Cheng
Cherry Hai
Eunsol Leong Chieu
Leshem Choi
Monojit Choshen
Christos Choudhury
Yi-Ling Christodoulopou-los
Stephen Chung
Vincent Clark
Simone Claveau
John M Conia
Caio Filippo Conroy
Mathias Corro
Leyang Creutz
Aron Cui
Anna E Culotta
Amanda Cercas Currey
Curry Raj
Daniel Dabre
Cristian Dakota
Verna Danescu-Niculescu-Mizil
Budhaditya Dankers
Deb Vera
Zhenyun Demberg
Li Deng
Ruihai Dong
Antoine Dong
Eduard Doucet
Nan Dragut
Kevin Duan
Greg Duh
Ondrej Durrett
Tomasz Dusek
Dwojak Julian Martin
Asif Eisenschlos
Yanai Ekbal
Cristina Elazar
Luis España-Bonet
Espinosa-Anke Allyson
Kilian Ettinger
Evang Alexander
Agnieszka Fabbri
Meng Falenska
Marcello Fang
Hao Federico
Anna Fei
Feldman Naomi
Fuli Feldman
Xiaocheng Feng
Yansong Feng
Eric Feng
Francis Le Ferrand
Eli-sabetta Ferraro
Simone Fersini
Mark Filice
Mark Finlayson
Jennifer Fishel
Annemarie Foster
Friedrich Matthias
Zhe Gallé
Siddhant Gan
Judith Garg
Kallirroi Gaspers
Alborz Georgila
Geramifard Luke
Mor Gessler
Abbas Geva
Sahar Ghaddar
Filip Ghannay
Mario Ginter
Tejas Giulianelli
Sharon Gokhale
Rob Goldwater
Kyle van der Goot
Tanya Gorman
Jia-Chen Goyal
Qing Gu
Frank Gu
Lin Guerin
Honglei Gui
Qipeng Guo
Vivek Guo
Gupta Thanh-Le
Nizar Ha
Ivan Habash
Barry Habernal
Xianpei Haddow
Daniel Han
Peter Hardt
Di Hase
Michael He
Behnam Heck
Peter Hedayatnia
Daniel Heeman
Jack Hershcovich
Ryuichiro Hes-sel
Julia Higashinaka
Enamul Hockenmaier
Andreas Hoque
Yufang Hotho
Hou Dirk
Kristen Hovy
Di Howell
Xuming Hu
Fei Hu
Jie Huang
Lifu Huang
Peijie Huang
Shaohan Huang
Shujian Huang
Xuanjing Huang
Zhen Huang
Mika Huang
Hämäläinen Kentaro
Inui Kokil
Hyeju Jaidka
Mustafa Jang
Yangfeng Jarrar
Lifeng Ji
Mali Jin
Qin Jin
Richard Jin
David Johansson
Preethi Jurgens
Jyothi Ehsan
Diptesh Kamalloo
S. Kanojia
Sarvnaz Kar
Pei Karimi
Daniel Ke
So-pan Khashabi
Tushar Khosla
Hyounghun Khot
Jin-Dong Kim
Joo-Kyung Kim
Taeuk Kim
Kim Roman
Rebecca Klinger
Ivan Knowles
Ekaterina Kobyzev
Philipp Kochmar
Koehn Mamoru
Rik Komachi
Lingpeng Koncel-Kedziorski
Julia Kong
Amrith Kreutzer
Kal-pesh Krishna
Udo Krishna
Artur Kruschwitz
Adhiguna Kulmizev
Kuncoro Wai
Gerasimos Lam
Mirella Lampouras
Staffan Lapata
Mark Larsson
Ivano Last
Lauriola Thu
Dong-Ho Le
Hwanhee Lee
Jinhyuk Lee
Mark G Lee
SangKeun Lee
Oliver Lee
Heather Le-mon
Piyawat Lent
Gina-Anne Lertvittayakumjorn
Miryam Levow
Bing de Lhoneux
Chuyuan Li
Dongxu Li
Jing Li
Junhui Li
Juntao Li
Liang Li
Peng Li
Piji Li
Sujian Li
Li Tao
Wenjie Li
Xin Li
Yongbin Li
Yu Li
Yufei Li
Zhifei Li
Constantine Li
Chenghua Lignos
Hongyu Lin
Robert Lin
Bing Litschko
Hao Liu
Kang Liu
Ming Liu
Qianying Liu
Tin-gwen Liu
Xuebo Liu
Yang Liu
Zhiyuan Liu
Zoey Liu
Ximing Liu
Anh Tuan Lu
Luu Chenyang
Lyu Ji
Jing Ma
Ruotian Ma
Xiaojuan Ma
Aman Ma
Harish Tayyar Madaan
Andrea Madabushi
Navonil Ma-dotto
Prodromos Majumder
Shervin Malakasiotis
Yuning Malmasi
Kelly Mao
Vukosi Marchi-sio
Stella Marivate
Lara J Markantonatou
Bruno Martin
Yuval Martins
Sérgio Marton
Yuji Matos
Julian Matsumoto
Bryan McAuley
Ryan McCann
Kathleen McDonald
McKeown Mahnoosh
Yuxian Mehrabani
Samuel Meng
Timothee Mensah
Margot Mickus
Simon Mieskes
Yasuhide Mille
Makoto Miura
Daichi Miwa
David R Mochihashi
Lili Mortensen
Kha-lil Mou
Benjamin Mrini
Philippe Muller
Smaranda Muller
Rudra Muresan
Thomas Murthy
Müller Max
Müller-Eberstein Maria
Nona Nadejde
Mikio Naderi
Hideki Nakano
Linyong Nakayama
Nan
Franco Maria
Tapas Nardini
Mark-Jan Nayak
Isar Nederhof
Mariana Nejadgholi
Dat Quoc Neves
Nguyen Le-Minh
Thien Huu Nguyen
Vahid Nguyen
Partovi Nia
Jan Niehues
Qiang Ning
Maciej Ogrodniczuk
Alice Oh
Naoaki Okazaki
Manabu Okumura
Matan Orbach
Nedjma Ou-sidhoum
Vasile Pais
Nikolaos Pappas
Joonsuk Park
Yannick Parmentier
Prasannan Parthasarathi
Lucia Passaro
Ramakanth Pasunuru
Siddharth Patwardhan
Hao Peng
Lis Pereira
Laura Perez-Beltrachini
Maxime Peyrard
Jonas Pfeiffer
Bryan A. Plummer
Maja Popovic
Soujanya Poria
Daniel Preotiuc-Pietro
Emily Prud'hommeaux
Vikram Pudi
Peng Qian
Tieyun Qian
Deepak Ramachandran
Carlos Ramisch
Leonardo Ranaldi
Sudha Rao
Shauli Ravfogel
Marek Rei
Leonardo F. R. Ribeiro
Oleg Rokhlenko
Salvatore Romeo
Joseph Le Roux
Alla Rozov-skaya
Terry Ruas
Raphael Rubino
Ivan Vladimir Meza Ruiz
Maria Ryskina
Hassan Sajjad
Shubhra Kanti
Karmaker Santu
Maarten Sap
Naomi Saphra
Asad B. Sayeed
Dominik Schlechtweg
Viktor Schlegel
Natalie Schluter
Nathan Schneider
Hinrich Schuetze
H. Schwartz
Jingbo Shang
Vasu Sharma
Tianze Shi
Mohammad Shoeybi
Lei Shu
Melanie Siegel Maneesh
Kumar Singh
Pranaydeep Singh
Sunayana Sitaram
Kevin Small
Luca Soldaini
Aina Garí Soler
Wei Song
Xingyi Song
Yan Song
Jeffrey S. Sorensen
Aitor Soroa
Jacopo Staiano
Efstathios Stamatatos
Gabriel Stanovsky
Shane Steinert-Threlkeld
Jannik Strötgen
Sara Stymne
Jinsong Su
Saku Sugawara
Alessandro Suglia
Aixin Sun
Cheng-jie Sun
Kai Sun
György Szarvas
Víctor M. Sánchez-Cartagena
Gözde Gül ¸Sahin
Zeerak Talat
Chenhao Tan
Hao Tan
Tianyi Tang
Jesse Thomason
Brian Thompson
Yuanhe Tian
Zhiliang Tian
Amalia Todirascu
Sara Tonelli
Paolo Torroni
Kristina Toutanova
Amine xv Trabelsi
Trang Tran
David R. Traum
Kewei Tu
Martin Tutek
Ana Sabina Uban
Takehito Utsuro
Olga Vechtomova
Yannick Versley
Karin M. Verspoor
David Vilar
David Vilares 0001
Serena Villa-ta
Esaú Villatoro-Tello
Thuy Vu
Ivan Vuli´c
Fei Xia
Tong Xiao
Bo Xu
Huijuan Xu
Nianwen Xue
S. Yadav
Hang Yan
Rui Yan
Min Yang
Wei Yang
Yezhou Yang
Yi Yang
Zhenglu Yang
Jin-Ge Yao
Wei Ye
Yongjing Yin
Naoki Yoshinaga
Koichiro Yoshino
Jianfei Yu
Juntao Yu Mo
Yu Manzil Zaheer
Fabio Massimo Zanzotto
Weixin Zeng
Luke Zettlemoyer
Biao Zhang
Chen Zhang
Crystina Zhang
Jiajun Zhang
Jingyi Zhang
Justine Zhang
Meishan Zhang
Ningyu Zhang
Shaolei Zhang
Sheng Zhang
Shiyue Zhang
Shuai Zhang
Shuo Zhang
Wei Zhang
Yang Zhang
Zhe Zhang
Jieyu Zhao
Shiwan Zhao
Hai-Tao Zheng
Zaixiang Zheng
Jie Zhou
Yi Zhou
Xiaodan Zhu
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
Tolúlope' Ògúnremí
Kọ́lá Túbọ̀sún
Aremu Anuoluwapo
Iroro Orife
Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models
Laura Gongas
Kenza Benkirane
Shahar Pelles
Naomi Fuchs
Joshua Darmon
Pontus Stenetorp
Eduardo Sánchez
Meta
Mitigating Translationese in Low-resource Languages: The Storyboard Approach
Garry Kuwanto
Eno-Abasi Urua
Priscilla A. Amuok
Shamsuddeen Hassan Muhammad
Aremu Anuoluwapo
Verrah Akinyi Otiende
Loice Emma Nanyanga
T. Nyoike
A. D. Akpan
Nsima Ab Udouboh
Idongesit Udeme Archibong
Idara Effiong Moses
Ifeoluwatayo A. Ige
Benjamin A. Ajibade
Olumide Benjamin Awokoya
Idris Abdulmumin
Saminu Mohammad Aliyu
Ruqayya Nasir Iro
Ibrahim Ahmad
Deontae Smith … (see 4 more)
Praise-EL Michaels
Derry Tanti Wijaya
Anietie U Andy
Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which… (see more) can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
Voices Unheard: NLP Resources and Models for Yor\`ub\'a Regional Dialects
Orevaoghene Ahia
Aremu Anuoluwapo
Diana Abagyan
Hila Gonen
Daud Abolade
Noah A. Smith
Yulia Tsvetkov
Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo
Tajuddeen Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
Bonaventure F. P. Dossou
Abdou Aziz DIOP
Claytone Sikasote
Gilles HACHEME
Happy Buzaaba
Ignatius Ezeani
Rooweither Mabuya
Salomey Osei
Chris Emezue
Albert Kahira
Shamsuddeen Hassan Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja … (see 32 more)
Iyanuoluwa Shode
Akari Asai
Tunde Oluwaseyi Ajayi
Clemencia Siro
Stephen Arthur
Mofetoluwa Adeyemi
Orevaoghene Ahia
Aremu Anuoluwapo
Oyinkansola Awosan
Chiamaka Ijeoma Chukwuneke
Bernard Opoku
Ayodele Awokoya
Verrah Akinyi Otiende
Christine Mwase
Boyd Sinkala
Andre Niyongabo Rubungo
Daniel Ajisafe
Emeka Felix Onwuegbuzia
Habib Mbow
Emile Niyomutabazi
Eunice Mukonde
Falalu Lawan
Ibrahim Ahmad
Jesujoba Oluwadara Alabi
Martin Namukombo
CHINEDU EMMANUEL MBONU
Mofya Phiri
Neo Putini
Ndumiso Mngoma
Priscilla A. Amuok
Ruqayya Nasir Iro
Sonia Adhiambo
Cross-lingual Open-Retrieval Question Answering for African Languages
Odunayo Ogundepo
Tajuddeen Gwadabe
Clara E. Rivera
Jonathan H. Clark
Sebastian Ruder
Bonaventure F. P. Dossou
Abdou Aziz DIOP
Claytone Sikasote
Gilles Q. Hacheme
Happy Buzaaba
Ignatius Majesty Ezeani
Rooweither Mabuya
Salomey Osei
Chris Emezue
Albert Njoroge Kahira
Shamsuddeen Hassan Muhammad
Akintunde Oladipo
Abraham Toluwase Owodunni
Atnafu Lambebo Tonja … (see 24 more)
Iyanuoluwa Shode
Akari Asai
Aremu Anuoluwapo
Ayodele Awokoya
Bernard Opoku
Chiamaka Ijeoma Chukwuneke
Christine Mwase
Clemencia Siro
Stephen Arthur
Tunde Oluwaseyi Ajayi
V. Otiende
Andre Niyongabo Rubungo
B. Sinkala
Daniel A. Ajisafe
Emeka Onwuegbuzia
Falalu Lawan
Ibrahim Ahmad
Jesujoba Alabi
CHINEDU EMMANUEL MBONU
Mofetoluwa Adeyemi
Mofya Phiri
Orevaoghene Ahia
Ruqayya Nasir Iro
Sonia Adhiambo
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Sebastian Ruder
Jonathan H. Clark
Alexander Gutkin
Mihir Kale
Min Ma
Massimo Nicosia
Shruti Rijhwani
Parker Riley
Jean Michel Amath Sarr
Xinyi Wang
John Frederick Wieting
Nitish Gupta
Anna Katanova
Christo Kirov
Dana L Dickinson
Brian Roark
Bidisha Samanta
Connie Tao
Vera Axelrod … (see 7 more)
Isaac Rayburn Caswell
Colin Cherry
Dan Garrette
Reeve Ingle
Melvin Johnson
Dmitry Panteleev
Partha Talukdar
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- l… (see more)anguages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models