Ensemble machine learning to accelerate industrial decarbonization: Prediction of Hansen solubility parameters for streamlined chemical solvent selection
Eslam G. Al-Sakkari
Ahmed Ragab
Mostafa Amer
Olumoye Ajao
Marzouk Benali
Daria Camilla Boffito
Mouloud Amazouz
Learning adversarially robust kernel ensembles with kernel average pooling
Reza Bayat
Adam Ibrahim
Amirozhan Dehghani
Yifei Ren
Learning adversarially robust kernel ensembles with kernel average pooling
Reza Bayat
Adam Ibrahim
Amirozhan Dehghani
Yifei Ren
Learning adversarially robust kernel ensembles with kernel average pooling
Reza Bayat
Adam Ibrahim
Amirozhan Dehghani
Yifei Ren
Patient Engagement in the Implementation of Electronic Patient Reported Outcome (ePRO) Tools: The Experience of Two Early Adopter Institutions in the pan-Canadian Radiotherapy PRO Initiative
Amanda Caissie
Jennifer Lane
Brittany V Barber
Sue Chisholm
Patient Engagement in the Implementation of Electronic Patient-Reported Outcome Tools: The Experience of Two Early-Adopter Institutions in the Pan-Canadian Radiotherapy Patient-Reported Outcome Initiative
Amanda Caissie
J. Lane
B. Barber
S. Chisholm
Predicting the Mathematics Literacy of Resilient Students from High‐performing Economies: A Machine Learning Approach
Yimei Zhang
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
Elizabeth Proehl
Kai Chen
Imaan Khadir
Naome Etori
Shamsuddeen Hassan Muhammad
C. Mpanza
Igneciah Pocia Thete
Dietrich Klakow
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual accuracy often focus on high-resource languages primari… (see more)ly because datasets for low-resource languages (LRLs) are scarce. In this paper, we present Uhura -- a new benchmark that focuses on two tasks in six typologically-diverse African languages, created via human translation of existing English benchmarks. The first dataset, Uhura-ARC-Easy, is composed of multiple-choice science questions. The second, Uhura-TruthfulQA, is a safety benchmark testing the truthfulness of models on topics including health, law, finance, and politics. We highlight the challenges creating benchmarks with highly technical content for LRLs and outline mitigation strategies. Our evaluation reveals a significant performance gap between proprietary models such as GPT-4o and o1-preview, and Claude models, and open-source models like Meta's LLaMA and Google's Gemma. Additionally, all models perform better in English than in African languages. These results indicate that LMs struggle with answering scientific questions and are more prone to generating false claims in low-resource African languages. Our findings underscore the necessity for continuous improvement of multilingual LM capabilities in LRL settings to ensure safe and reliable use in real-world contexts. We open-source the Uhura Benchmark and Uhura Platform to foster further research and development in NLP for LRLs.
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros
Atif Belal
Srikanth Muralidharan
Eric Granger
The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work ha… (see more)s explored image translation techniques to adapt detectors to new modalities, these methods are limited to a single modality and apply only to traditional detectors. Recently, vision-language detectors, such as YOLO-World and Grounding DINO, have shown promising zero-shot capabilities, however, they have not yet been adapted for other visual modalities. Traditional fine-tuning approaches tend to compromise the zero-shot capabilities of the detectors. The visual prompt strategies commonly used for classification with vision-language models apply the same linear prompt translation to each image making them less effective. To address these limitations, we propose ModPrompt, a visual prompt strategy to adapt vision-language detectors to new modalities without degrading zero-shot performance. In particular, an encoder-decoder visual prompt strategy is proposed, further enhanced by the integration of inference-friendly task residuals, facilitating more robust adaptation. Empirically, we benchmark our method for modality adaptation on two vision-language detectors, YOLO-World and Grounding DINO, and on challenging infrared (LLVIP, FLIR) and depth (NYUv2) data, achieving performance comparable to full fine-tuning while preserving the model's zero-shot capability. Our code is available at: https://github.com/heitorrapela/ModPrompt
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda
Matheus Gadelha
Vladimir Kim
Amit H. Bermano
Thibault Groueix
We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without t… (see more)he need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multiview image inpainting problem, as this representation is generic and can be mapped back to any 3D representation using the bank of available Large Reconstruction Models. We explore different fine-tuning strategies to obtain both multiview generation and inpainting capabilities within the same diffusion model. In particular, the design of the inpainting mask is an important factor of training an inpainting model, and we propose several masking strategies to mimic the types of edits a user would perform on a 3D shape. Our approach takes 3D generative editing from hours to seconds and produces higher-quality results compared to previous works.