Foutse Khomh

2023-12-22

ArXiv (preprint)

Studying the Practices of Testing Machine Learning Software in the Wild

Moses Openja

Armstrong Foundjem

Zhen Ming Jiang

Mouna Abidi

Ahmed E. Hassan

Background: We are witnessing an increasing adoption of machine learning (ML), especially deep learning (DL) algorithms in many software sys… (see more)tems, including safety-critical systems such as health care systems or autonomous driving vehicles. Ensuring the software quality of these systems is yet an open challenge for the research community, mainly due to the inductive nature of ML software systems. Traditionally, software systems were constructed deductively, by writing down the rules that govern the behavior of the system as program code. However, for ML software, these rules are inferred from training data. Few recent research advances in the quality assurance of ML systems have adapted different concepts from traditional software testing, such as mutation testing, to help improve the reliability of ML software systems. However, it is unclear if any of these proposed testing techniques from research are adopted in practice. There is little empirical evidence about the testing strategies of ML engineers. Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing practices in the wild, to identify the ML properties being tested, the followed testing strategies, and their implementation throughout the ML workflow. Method: First, we systematically summarized the different testing strategies (e.g., Oracle Approximation), the tested ML properties (e.g., Correctness, Bias, and Fairness), and the testing methods (e.g., Unit test) from the literature. Then, we conducted a study to understand the practices of testing ML software. Results: In our findings: 1) we identified four (4) major categories of testing strategy including Grey-box, White-box, Black-box, and Heuristic-based techniques that are used by the ML engineers to find software bugs. 2) We identified 16 ML properties that are tested in the ML workflow.

2023-12-19

ArXiv (preprint)

Harnessing Predictive Modeling and Software Analytics in the Age of LLM-Powered Software Development (Invited Talk)

2023-12-08

International Conference on Predictive Models in Software Engineering (published)

Bug characterization in machine learning-based systems

Mohammad Mehdi Morovati

Amin Nikanjam

Florian Tambon

Z. Jiang

2023-12-05

Empirical Software Engineering (published)

A Machine Learning Based Approach to Detect Machine Learning Design Patterns

Weitao Pan

Hironori Washizaki

Nobukazu Yoshioka

Yoshiaki Fukazawa

Yann‐Gaël Guéhéneuc

As machine learning expands to various domains, the demand for reusable solutions to similar problems increases. Machine learning design pat… (see more)terns are reusable solutions to design problems of machine learning applications. They can significantly enhance programmers' productivity in programming that requires machine learning algorithms. Given the critical role of machine learning design patterns, the automated detection of them becomes equally vital. However, identifying design patterns can be time-consuming and error-prone. We propose an approach to detect their occurrences in Python files. Our approach uses an Abstract Syntax Tree (AST) of Python files to build a corpus of data and train a refined Text-CNN model to automatically identify machine learning design patterns. We empirically validate our approach by conducting an exploratory study to detect four common machine learning design patterns: Embedding, Multilabel, Feature Cross, and Hashed Feature. We manually label 450 Python code files containing these design patterns from repositories of projects in GitHub. Our approach achieves accuracy values ranging from 80 % to 92% for each of the four patterns.

2023-12-04

Asia-Pacific Software Engineering Conference (published)

A large-scale exploratory study of android sports apps in the google play store

Bhagya Chembakottu

Heng Li

2023-12-01

Information and Software Technology (published)

Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow

Florian Tambon

Amin Nikanjam

Le An

Giuliano Antoniol

2023-11-29

Empirical Software Engineering (published)

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Aaditya Bhatia

Bram Adams

Ahmed E. Hassan

The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algori… (see more)thms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.

2023-11-20

ArXiv (preprint)

Assessing the Security of GitHub Copilot Generated Code - A Targeted Replication Study

Vahid Majdinasab

Michael Joshua Bishop

Shawn Rasheed

Arghavan Moradi Dakhel

Amjed Tahir