Jin Guo

The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impede… (voir plus)s model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.

2022-04-13

ArXiv (preprint)

How Do Open Source Software Contributors Perceive and Address Usability?: Valued Factors, Practices, and Challenges

Wenting Wang

Jinghui Cheng

Given the recent changes in the open source software (OSS) landscape, we examined OSS contributors’ current valued factors, practices, and… (voir plus) challenges concerning usability. Our survey provides insights for OSS practitioners and tool designers to promote a user-centric mindset and improve usability practice in OSS communities.

2022-02-01

IEEE Software (publié)

Automated, interactive, and traceable domain modelling empowered by artificial intelligence

Rijul Saini

Gunter Mussbacher

Jörg Kienzle

2022-01-08

Software and Systems Modeling (publié)

Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability

Avinash Bhat

Austin Coursey

Grace Hu

Sixian Li

Nadia Nahar

Shurui Zhou

Christian KÃ¤stner

2022-01-01

arXiv.org (prépublication)

Deposited in DRO : 17 January 2022 Version of attached le : Accepted Version Peer-review status of attached

Nelly Bencomo

Rachel Harrison

Hans-Martin Heyn

Tim Menzies

Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowl… (voir plus)edge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software. Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowledge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software1. To begin, we first need a definition. What is requirements engineering or RE? RE used to be viewed as an early lifecycle activity that proceeded analysis, design, coding and testing. For safety critical applications there is certainly a pressing need to create those requirements before the coding starts (we will return to this point, later in the paper). However, in this age of DevOps and Autonomous and Self-adaptive systems, requirements can happen at many other times in a software project[15], [14]. We say that: Requirements engineering is any discussion about what to build and how to trade-off competing cost/benefits. It can happen before, during, or after runtime. 1This paper is based on the Panel “Artificial Intelligence and Requirement Engineering: Challenges and Opportunities”, which took place at the Eighth International Workshop on Artificial Intelligence and Requirements Engineering (AIRE). As shown in Table 1 and Table 2, there are many ways AI can help RE, across a broad range of SE activities. But, what about the other way around? If we add more requirements into AI, and use RE methods to get truly desired requirements, can we make better software by combining human and artificial intelligence? In our view, when integrating AI into software engineering is a co-design problem between humans, the AI model, the data required to train and validate the desired behaviour, and the hardware running the AI model, in addition to the classical software components. This means that when integrating AI, you need to know and understand the context of the system in which you want to apply your AI model to derive the necessary model requirements [17]. For example, in the arena of safety critical systems, model construction must be guided by safety requirements. one challenge for AI in RE are safety standards that base on the EN-IEC 61508 standard2. These safety standards assume that for software only systematic faults exists. Therefore, they emphasise correct processes and the creation of lifecycle artifacts to minimise systematic mistakes during both the 2Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems; for example ISO 26262 for the automotive sector or IEC 61511 for the process industry. IEEE Software (submitted) Published by the IEEE Computer Society © 2021 IEEE 1

2022-01-01

(publié)

www.semanticscholar.org

GitHub repositories with links to academic papers: Public access, traceability, and evolution

Supatsara Wattanakriengkrai

Bodin Chinthanet

Hideaki Hata

Raula Gaikovina Kula

Christoph Treude

Kenichi Matsumoto

2022-01-01

Journal of Systems and Software (publié)

The Secret to Better AI and Better Software (Is Requirements Engineering)

Nelly Bencomo

Rachel. Harrison

Hans-Martin Heyn

Tim J Menzies

Recently, practitioners and researchers met to discuss the role of requirements, and AI and SE. We offer here notes on that fascinating disc… (voir plus)ussion. Also, have you considered writing for this column? This “SE for AI” column publishes commentaries on the growing field of SE for AI. Submissions are welcomed and encouraged (1,000–2,400 words, each figure and table counts as 250 words, try to use fewer than 12 references, and keep the discussion practitioner focused). Please submit your ideas to me at timm@ieee.org.—Tim Menzies

2022-01-01

IEEE Software (published)

The Secret to Better AI and Better Software (Is Requirements Engineering)

Nelly Bencomo

Rachel Harrison

Hans-Martin Heyn

Tim Menzies

2022-01-01

IEEE Software (publié)

Splitting, Renaming, Removing: A Study of Common Cleaning Activities in Jupyter Notebooks

Helen Dong

Shurui Zhou

Christian KÃ¤stner

Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. However, once the … (voir plus)scientist completes the code and finds the ideal model, he or she will have to dedicate time to clean up the code in order for others to easily understand it. In this paper, we perform a qualitative study on how scientists clean their code in hopes of being able to suggest a tool to automate this process. Our end goal is for tool builders to address possible gaps and provide additional aid to data scientists, who then can focus more on their actual work rather than the routine and tedious cleaning work. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown (e.g., adding headers sections or descriptions) or comments (both deleting dead code and adding descriptions) as well as reordering cells. We also find that common cleaning activities differ depending on the intended purpose of the notebook. Our results provide a valuable foundation for tool builders and notebook users, as many identified cleaning activities could benefit from codification of best practices and dedicated tool support, possibly tailored depending on intended use.

2021-11-15

2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) (publié)

Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code

Chenyang Yang

Shurui Zhou

Christian KÃ¤stner

Data scientists reportedly spend a significant amount of their time in their daily routines on data wrangling, i.e. cleaning data and extrac… (voir plus)ting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which results in reduced model quality. To support data scientists with data wrangling, we present a technique to generate documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide on-demand documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.

2021-11-15

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (publié)

Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches

Jazlyn Hellman

Eunbee Jang

Christoph Treude

Chenzhun Huang

Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users.… (voir plus) Repository descriptions serve as one of the first points of contact for users who are accessing a repository. However, repository owners often fail to provide a high-quality description; instead, they use vague terms, the purpose of the repository is poorly explained, or the description is omitted entirely. In this work, we examine the current practice of writing GitHub repository descriptions. Our investigation leads to the proposal of the LSP (Language, Software technology, and Purpose) template to formulate good descriptions for GitHub repositories that are clear, concise, and informative. To understand the extent to which current automated techniques can support generating repository descriptions, we compare the performance of state-of-the-art text summarization methods on this task. Finally, our user study with GitHub users reveals that automated summarization can adequately be used for default description generation for GitHub repositories, while the descriptions which follow the LSP template offer the most effective instrument for communicating with GitHub users.

2021-10-25

ArXiv (prépublication)

DoMoBOT: An AI-Empowered Bot for Automated and Interactive Domain Modelling

Rijul Saini

Gunter Mussbacher

Jörg Kienzle

Domain modelling transforms informal requirements written in natural language in the form of problem descriptions into concise and analyzabl… (voir plus)e domain models. As the manual construction of these domain models is often time-consuming, error-prone, and labor-intensive, several approaches already exist to automate domain modelling. However, the current approaches suffer from lower accuracy of extracted domain models and the lack of support for system-modeller interactions. To better assist modellers, we introduce DoMoBOT, a web-based Domain Modelling BOT. Our proposed bot combines artificial intelligence techniques such as natural language processing and machine learning to extract domain models with higher accuracy. More importantly, our bot incorporates a set of features to bring synergy between automated model extraction and bot-modeller interactions. During these interactions, the bot presents multiple possible solutions to a modeller for modelling scenarios present in a given problem description. The bot further enables modellers to switch to a particular solution and updates the other parts of the domain model proactively. In this tool demo paper, we demonstrate how the implementation and architecture of DoMoBOT support the paradigm of automated and interactive domain modelling for assisting modellers.

2021-10-10

2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C) (publié)