Publications

Clement Odoje

Idris Akinade

Iffat Maab

Davis David

Shamsuddeen Hassan Muhammad

Neo Putini

David O. Ademuyiwa

Andrew Caines

Dietrich Klakow

This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.

2025-01-10

ArXiv (prépublication)

AFRIDOC-MT: Document-level MT Corpus for African Languages

Jesujoba Oluwadara Alabi

Israel Abebe Azime

Miaoran Zhang

Cristina España-Bonet

Rachel Bawden

Dawei Zhu

Clement Odoje

Idris Akinade

Iffat Maab

Davis David

Shamsuddeen Hassan Muhammad

Neo Putini

David O. Ademuyiwa

Andrew Caines

Dietrich Klakow

This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.

2025-01-10

ArXiv (prépublication)

EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data

Rohan Banerjee

Merve Kaptan

Alexandra Tinnermann

Ali Khatibi

Alice Dabbagh

Christian Büchel

Christine S.W. Law

Christian W. Kündig

Csw Law

Dario Pfyffer

David J. Lythgoe

Dimitra Tsivaka

Dimitri Van De Ville

Falk Eippert

Fauziyya Muhammad

Gary H. Glover

Gergely David

Grace Haynes

Jan Haaker

Jonathan C. W. Brooks … (voir 23 de plus)

Jürgen Finsterbusch

Katherine T. Martucci

Kimberly J. Hemmerling

Mahdi Mobarak-Abadi

Mark A. Hoggarth

Matthew A. Howard

Molly G. Bright

Nawal Kinany

Olivia S. Kowalczyk

Patrick Freund

Robert L. Barry

Sean Mackey

Shahabeddin Vahdat

Simon Schading

Stephen B. McMahon

Todd Parish

Véronique Marchand-Pauvert

Yufen Chen

Kenneth A. Weber

Zachary A. Smith

KA Weber

Benjamin De Leener

Julien Cohen-Adad

Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.

2025-01-10

bioRxiv (prépublication)

EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data

Rohan Banerjee

Merve Kaptan

Alexandra Tinnermann

Ali Khatibi

Alice Dabbagh

Christian W. Kündig

Csw Law

Dario Pfyffer

David J. Lythgoe

Dimitra Tsivaka

Dimitri Van De Ville

Falk Eippert

Fauziyya Muhammad

Gary H. Glover

Gergely David

Grace Haynes

Jan Haaker

Jonathan C. W. Brooks

Jürgen Finsterbusch

Katherine T. Martucci … (voir 20 de plus)

Kimberly J. Hemmerling

Mahdi Mobarak-Abadi

Mark A. Hoggarth

Matthew A. Howard

Molly G. Bright

Nawal Kinany

O. Kowalczyk

Patrick Freund

Robert L. Barry

Sean Mackey

Shahabeddin Vahdat

Simon Schading

Stephen B McMahon

Todd Parish

Véronique Marchand-Pauvert

Yufen Chen

Zachary A. Smith

KA Weber

Benjamin De Leener

Julien Cohen-Adad

Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.

2025-01-10

bioRxiv (prépublication)

EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data

Rohan Banerjee

Merve Kaptan

Alexandra Tinnermann

Ali Khatibi

Alice Dabbagh

Christian W. Kündig

Csw Law

Dario Pfyffer

David J. Lythgoe

Dimitra Tsivaka

Dimitri Van De Ville

Falk Eippert

Fauziyya Muhammad

Gary H. Glover

Gergely David

Grace Haynes

Jan Haaker

Jonathan C. W. Brooks

Jürgen Finsterbusch

Katherine T. Martucci … (voir 20 de plus)

Kimberly J. Hemmerling

Mahdi Mobarak-Abadi

Mark A. Hoggarth

Matthew A. Howard

Molly G. Bright

Nawal Kinany

O. Kowalczyk

Patrick Freund

Robert L. Barry

Sean Mackey

Shahabeddin Vahdat

Simon Schading

Stephen B McMahon

Todd Parish

Véronique Marchand-Pauvert

Yufen Chen

Zachary A. Smith

KA Weber

Benjamin De Leener

Julien Cohen-Adad

Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.

2025-01-10

bioRxiv (prépublication)

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

Fabian David Schmidt

Ivan Vuli'c

Goran Glavavs

2025-01-10

ArXiv (prépublication)

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

Fabian David Schmidt

Ivan Vuli'c

Goran Glavavs

Spoken language understanding (SLU) is indispensable for half of all living languages that lack a formal writing system, since these languag… (voir plus)es cannot pair automatic speech recognition (ASR) with language models to benefit from language technology. Even if low-resource languages possess a writing system, ASR for these languages remains unreliable due to limited bimodal speech and text training data. Better SLU can strengthen the robustness of massively multilingual ASR by levering language semantics to disambiguate utterances via context or exploiting semantic similarities across languages. However, the evaluation of multilingual SLU remains limited to shallow tasks such as intent classification or language identification. To address this, we present Fleurs-SLU, a multilingual SLU benchmark that encompasses (i) 692 hours of speech for topical utterance classification in 102 languages and (ii) multiple-choice question answering through listening comprehension spanning 944 hours of speech across 92 languages. We extensively evaluate both end-to-end speech classification models and cascaded systems that combine speech-to-text transcription with subsequent classification by large language models on Fleurs-SLU. Our results show that cascaded systems exhibit greater robustness in multilingual SLU tasks, though speech encoders can achieve competitive performance in topical speech classification when appropriately pre-trained. We further find a strong correlation between robust multilingual ASR, effective speech-to-text translation, and strong multilingual SLU, highlighting the mutual benefits between acoustic and semantic speech representations.

2025-01-10

ArXiv (prépublication)

Open Problems in Machine Unlearning for AI Safety

Fazl Barez

Tingchen Fu

Ameya Prabhu

Stephen Casper

Amartya Sanyal

Adel Bibi

Aidan O'Gara

Robert Kirk

Benjamin Bucknall

Timothy Fist

Luke Ong

Philip Torr

Kwok-Yan Lam

Robert Trager

David Scott Krueger

Sören Mindermann

Jose Hernandez-Orallo

Mor Geva

Yarin Gal

As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.

2025-01-09

ArXiv (prépublication)

Open Problems in Machine Unlearning for AI Safety

Fazl Barez

Tingchen Fu

Ameya Prabhu

Stephen Casper

Amartya Sanyal

Adel Bibi

Aidan O'Gara

Robert Kirk

Benjamin Bucknall

Tim Fist

Luke Ong

Philip H. S. Torr

Kwok-Yan Lam

Robert F. Trager

David Scott Krueger

Sören Mindermann

Jose Hernandez-Orallo

Mor Geva

Yarin Gal

As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.

2025-01-09

ArXiv (prépublication)

Gintare Karolina Dziugaite

Soup to go: mitigating forgetting during continual learning with model averaging

Anat Kleiman

Jonathan Frankle

Sham M. Kakade

Mansheej Paul

In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earli… (voir plus)er tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.

2025-01-09

ArXiv (prépublication)

Gintare Karolina Dziugaite

Soup to go: mitigating forgetting during continual learning with model averaging

Anat Kleiman

Jonathan Frankle

Sham M. Kakade

Mansheej Paul

In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earli… (voir plus)er tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.

2025-01-09

ArXiv (prépublication)

openreview.net

GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions

Ali Imran

Giovanni Beltrame

David St-Onge

In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (voir plus)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.

2025-01-08

ArXiv (prépublication)