AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
Dawei Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
D. Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
Cristina España-Bonet
Rachel Bawden
D. Zhu
Clement Odoje
Idris Akinade
Iffat Maab
Davis David
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, … (voir plus)Hausa, Swahili, Yor\`ub\'a, and Zulu. The dataset comprises 334 health and 271 information technology news documents, all human-translated from English to these languages. We conduct document-level translation benchmark experiments by evaluating neural machine translation (NMT) models and large language models (LLMs) for translations between English and these languages, at both the sentence and pseudo-document levels. These outputs are realigned to form complete documents for evaluation. Our results indicate that NLLB-200 achieved the best average performance among the standard NMT models, while GPT-4o outperformed general-purpose LLMs. Fine-tuning selected models led to substantial performance gains, but models trained on sentences struggled to generalize effectively to longer documents. Furthermore, our analysis reveals that some LLMs exhibit issues such as under-generation, repetition of words or phrases, and off-target translations, especially for African languages.
EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data
Rohan Banerjee
Merve Kaptan
Alexandra Tinnermann
Ali Khatibi
Alice Dabbagh
Christian W. Kündig
Christian Büchel
Christine S.W. Law
Csw Law
Dario Pfyffer
David J. Lythgoe
Dimitra Tsivaka
Dimitri Van De Ville
Falk Eippert
Fauziyya Muhammad
Gary H. Glover
Gergely David
Grace Haynes
Jan Haaker
Jonathan C. W. Brooks … (voir 23 de plus)
Jürgen Finsterbusch
Katherine T. Martucci
Kimberly J. Hemmerling
Mahdi Mobarak-Abadi
Mark A. Hoggarth
Matthew A. Howard
Molly G. Bright
Nawal Kinany
Olivia S. Kowalczyk
Patrick Freund
Robert L. Barry
Sean Mackey
Shahabeddin Vahdat
Simon Schading
Stephen B. McMahon
Todd Parish
Véronique Marchand-Pauvert
Yufen Chen
Zachary A. Smith
Kenneth A. Weber
KA Weber
Benjamin De Leener
Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.
EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data
Rohan Banerjee
Merve Kaptan
Alexandra Tinnermann
Ali Khatibi
Alice Dabbagh
Christian W. Kündig
Csw Law
Dario Pfyffer
David J. Lythgoe
Dimitra Tsivaka
Dimitri Van De Ville
Falk Eippert
Fauziyya Muhammad
Gary H. Glover
Gergely David
Grace Haynes
Jan Haaker
Jonathan C. W. Brooks
Jürgen Finsterbusch
Katherine T. Martucci … (voir 20 de plus)
Kimberly J. Hemmerling
Mahdi Mobarak-Abadi
Mark A. Hoggarth
Matthew A. Howard
Molly G. Bright
Nawal Kinany
O. Kowalczyk
Patrick Freund
Robert L. Barry
Sean Mackey
Shahabeddin Vahdat
Simon Schading
Stephen B McMahon
Todd Parish
Véronique Marchand-Pauvert
Yufen Chen
Zachary A. Smith
KA Weber
Benjamin De Leener
Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.
EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data
Rohan Banerjee
Merve Kaptan
Alexandra Tinnermann
Ali Khatibi
Alice Dabbagh
Christian W. Kündig
Csw Law
Dario Pfyffer
David J. Lythgoe
Dimitra Tsivaka
Dimitri Van De Ville
Falk Eippert
Fauziyya Muhammad
Gary H. Glover
Gergely David
Grace Haynes
Jan Haaker
Jonathan C. W. Brooks
Jürgen Finsterbusch
Katherine T. Martucci … (voir 20 de plus)
Kimberly J. Hemmerling
Mahdi Mobarak-Abadi
Mark A. Hoggarth
Matthew A. Howard
Molly G. Bright
Nawal Kinany
O. Kowalczyk
Patrick Freund
Robert L. Barry
Sean Mackey
Shahabeddin Vahdat
Simon Schading
Stephen B McMahon
Todd Parish
Véronique Marchand-Pauvert
Yufen Chen
Zachary A. Smith
KA Weber
Benjamin De Leener
Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.0, and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared to other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Tim Fist
Luke Ong
Philip Torr
Kwok-Yan Lam
Robert F. Trager
Sören Mindermann
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Tim Fist
Luke Ong
Philip H. S. Torr
Kwok-Yan Lam
Robert F. Trager
Sören Mindermann
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Soup to go: mitigating forgetting during continual learning with model averaging
Anat Kleiman
Jonathan Frankle
Sham M. Kakade
Mansheej Paul
In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earli… (voir plus)er tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.
Soup to go: mitigating forgetting during continual learning with model averaging
Anat Kleiman
Jonathan Frankle
Sham M. Kakade
Mansheej Paul
In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earli… (voir plus)er tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.
GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions
Ali Imran
David St-Onge
In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (voir plus)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.
GNN-based Decentralized Perception in Multirobot Systems for Predicting Worker Actions
Ali Imran
David St-Onge
In industrial environments, predicting human actions is essential for ensuring safe and effective collaboration between humans and robots. T… (voir plus)his paper introduces a perception framework that enables mobile robots to understand and share information about human actions in a decentralized way. The framework first allows each robot to build a spatial graph representing its surroundings, which it then shares with other robots. This shared spatial data is combined with temporal information to track human behavior over time. A swarm-inspired decision-making process is used to ensure all robots agree on a unified interpretation of the human's actions. Results show that adding more robots and incorporating longer time sequences improve prediction accuracy. Additionally, the consensus mechanism increases system resilience, making the multi-robot setup more reliable in dynamic industrial settings.