Publications

Studying Logging Practice in Machine Learning-based Applications
Patrick Loic Foalem
Heng Li
Logging is a common practice in traditional software development. Several research works have been done to investigate the different charact… (see more)eristics of logging practices in traditional software systems (e.g., Android applications, JAVA applications, C/C++ applications). Nowadays, we are witnessing more and more development of Machine Learning-based applications (ML-based applications). Today, there are many popular libraries that facilitate and contribute to the development of such applications, among which we can mention: Pytorch, Tensorflow, Theano, MXNet, Scikit-Learn, Caffe, and Keras. Despite the popularity of ML, we don't have a clear understanding of logging practices in ML applications. In this paper, we aim to fill this knowledge gap and help ML practitioners understand the characteristics of logging in ML-based applications. In particular, we conduct an empirical study on 110 open-source ML-based applications. Through a quantitative analysis, we find that logging practice in ML-based applications is less pervasive than in traditional applications including Android, JAVA, and C/C++ applications. Furthermore, the majority of logging statements in ML-based applications are in info and warn levels, compared to traditional applications where info is the majority of logging statement in C/C++ application and debug, error levels constitute the majority of logging statement in Android application. We also perform a quantitative and qualitative analysis of a random sample of logging statements to understand where ML developers put most of logging statements and examine why and how they are using logging. These analyses led to the following observations: (i) ML developers put most of the logging statements in model training, and in non-ML components. (ii) Data and model management appear to be the main reason behind the introduction of logging statements in ML-based applications.
SantaCoder: don't reach for the stars!
Loubna Ben allal
Raymond Li
Denis Kocetkov
Chenghao Mou
Christopher Akiki
Carlos Muñoz Ferrandis
Niklas Muennighoff
Mayank Mishra
Alex Gu
Manan Dey
Logesh Kumar Umapathi
Carolyn Jane Anderson
Yangtian Zi
Joel Lamy Poirier
Hailey Schoelkopf
S. Troshin
Dmitry Abulkhanov
Manuel L. Romero
M. Lappert
Francesco De Toni … (see 21 more)
Bernardo Garc'ia del R'io
Qian Liu
Shamik Bose
Urvashi Bhattacharyya
Terry Yue Zhuo
Ian Yu
Paulo Villegas
Marco Zocca
Sourab Mangrulkar
D. Lansky
Huu Nguyen
Danish Contractor
Luisa Villa
Jia LI
Yacine Jernite
Sean Christopher Hughes
Daniel Fried
Arjun Guha
Harm de Vries
Leandro Von Werra
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech … (see more)report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design
Houssem Ben Braiek
Ali Tfaily
Thomas Reid
Ciro Guida
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
Amirhossein Kazerouni
Milad Soltany
Reza Azad
Ehsan Khodapanah Aghdam
Dorit Merhof
Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation … (see more)in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure. Extensive experiments on various medical image segmentation datasets demonstrate the effectiveness of HiFormer over other CNN-based, transformer-based, and hybrid methods in terms of computational complexity, quantitative and qualitative results. Our code is publicly available at GitHub.
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
Xi Chen
Chen Ma
Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing t… (see more)he model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32×, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels (~30×, ~40×), our method significantly improves the test accuracy and reduces the model size.
« Que notre cerveau soit constitué de neurones n’est pas un accident »
Roman Ikonicoff
On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects
Sumana Basu
M. Legault
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify … (see more)two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged affect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favourable qualitative behavior in our policy analysis.
2023 S TOCHASTIC S IMULATED Q UANTUM A NNEALING FOR F AST S OLVING C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditio… (see more)nal SA method, respectively, for similar convergence probabilities.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages
Jiayi Wang
Sweta Agrawal
Ricardo Rei
Eleftheria Briakou
Marine Carpuat
Marek Masiak
Xuanli He
Sofia Bourhim
Andiswa Bukula
Muhidin A. Mohamed
Temitayo Olatoye
Hamam Mokayede
Christine Mwase
Wangui Kimotho
Foutse Yuehgoh
Anuoluwapo Aremu
Jessica Ojo
Shamsuddeen Hassan Muhammad
Salomey Osei … (see 37 more)
Abdul-Hakeem Omotayo
Chiamaka Chukwuneke
Perez Ogayo
Oumaima Hourrane
Salma El Anigri
Lolwethu Ndolela
Thabiso Mangwana
Shafie Abdi Mohamed
Ayinde Hassan
Oluwabusayo Olufunke Awoyomi
Lama Alkhaled
sana Sabah al-azzawi
Naome A. Etori
Millicent A. Ochieng
Clemencia Siro
Samuel Njoroge
Eric Muchiri
Wangari Kimotho
Lyse Naomi Wamba Momo
Daud Abolade
Simbiat Ajao
Tosin P. Adewumi
Iyanuoluwa Shode
Ricky Macharm
Ruqayya Nasir Iro
Saheed Salahudeen Abdullahi
Stephen E. Moore
Bernard Opoku
Zainab Akinjobi
Abeeb Afolabi
Nnaemeka Casmir Obiefuna
Onyekachi Ogbu
Sam Brian
V. Otiende
CHINEDU EMMANUEL MBONU
Toadoum Sari Sakayo
Pontus Stenetorp
Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced… (see more) African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n -gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologi-cally diverse African languages. Furthermore, we develop A FRI COMET—a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments ( +0 . 406 ).
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages
Jiayi Wang
Sweta Agrawal
Ricardo Rei
Eleftheria Briakou
Marine Carpuat
Marek Masiak
Xuanli He
Sofia Bourhim
Andiswa Bukula
Muhidin A. Mohamed
Temitayo Olatoye
Hamam Mokayede
Christine Mwase
Wangui Kimotho
Foutse Yuehgoh
Aremu Anuoluwapo
Jessica Ojo
Shamsuddeen Hassan Muhammad
Salomey Osei … (see 37 more)
Abdul-Hakeem Omotayo
Chiamaka Ijeoma Chukwuneke
Perez Ogayo
Oumaima Hourrane
Salma El Anigri
Lolwethu Ndolela
Thabiso Mangwana
Shafie Abdi Mohamed
Ayinde Hassan
Oluwabusayo Olufunke Awoyomi
Lama Alkhaled
sana Sabah al-azzawi
Naome A. Etori
Millicent A. Ochieng
Clemencia Siro
Samuel Njoroge
Eric Muchiri
Wangari Kimotho
Lyse Naomi Wamba Momo
Daud Abolade
Simbiat Ajao
Tosin Adewumi
Iyanuoluwa Shode
Ricky Macharm
Ruqayya Nasir Iro
Saheed Salahudeen Abdullahi
Stephen E. Moore
Bernard Opoku
Zainab Akinjobi
Abeeb Afolabi
Nnaemeka Casmir Obiefuna
Onyekachi Ogbu
Sam Brian
Verrah Akinyi Otiende
CHINEDU EMMANUEL MBONU
Toadoum Sari Sakayo
Pontus Stenetorp
Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced… (see more) African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n -gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologi-cally diverse African languages. Furthermore, we develop A FRI COMET—a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments ( +0 . 406 ).