Portrait de Foutse Khomh

Foutse Khomh

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur, Polytechnique Montréal, Département de génie informatique et génie logiciel

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Maîtrise recherche - Polytechnique Montréal
Maîtrise recherche - Polytechnique Montréal
Doctorat - Polytechnique Montréal
Maîtrise recherche - Polytechnique Montréal
Maîtrise recherche - Polytechnique Montréal

Publications

PaReco: patched clones and missed patches among the divergent variants of a software family
Poedjadevie Kadjel Ramkisoen
John Businge
Brent van Bladel
Alexandre Decan
Serge Demeyer
Coen De Roover
Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. Howev… (voir plus)er, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.
Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication
Aurel Ikama
Vincent Du
Philippe Belias
Biruk Asmare Muse
Mohammad Hamdaqa
Anti-patterns manifesting on software code through code smells have been investigated in terms of their prevalence, detection, refactoring, … (voir plus)and impact on software quality attributes. In particular, leveraging heuristics to identify fault-fixing commits, Khomh et al. have found that anti-patterns and code smells have an impact on the fault-proneness of a software system. Similarly, Saboury et al. found a relationship between anti-pattern occurrences and fault-proneness, using heuristic to identify fault-fixing commits and fault-inducing changes. However, recent studies question the accuracy of heuristics, and thus the validity of empirical studies that leverage it. Hence, in this work, we would like to investigate to what extent the results of empirical studies using heuristics to identify bug fix commits are affected by the limitations of the heuristics based approach using manually validated bug fix commits as a ground truth. In particular, we conduct a differentiated replication of the work by Khomh et al. We particularly focused on the impact of anti-patterns on fault-proneness as it is the only dependent variable that may be affected by noise in the collected faults data. In our differentiated replication study, (1) we expanded the number of subject systems from 5 to 38, (2) utilized a manually validated dataset of bug-fixing commits from the work of Herbold et al., and (3) answered research questions from Khomh et al., that are related to the relationship between anti-pattern occurrences and fault-proneness. (4) We added an additional research question to investigate if combining results from several heuristic-based approaches could help reduce the impact of noise. Our findings show that the impact of the noise generated by the automatic algorithm heuristic based is negligible for the studied subject systems; meaning that the reported relation observed on noisy data still holds on the clean data. However, we also observed that combining results from several heuristic based approaches do not reduce this noise, quite the contrary.
Video Game Bad Smells: What They Are and How Developers Perceive Them
Vittoria Nardone
Biruk Asmare Muse
Mouna Abidi
Massimiliano Di Penta
Video games represent a substantial and increasing share of the software market. However, their development is particularly challenging as i… (voir plus)t requires multi-faceted knowledge, which is not consolidated in computer science education yet. This article aims at defining a catalog of bad smells related to video game development. To achieve this goal, we mined discussions on general-purpose and video game-specific forums. After querying such a forum, we adopted an open coding strategy on a statistically significant sample of 572 discussions, stratified over different forums. As a result, we obtained a catalog of 28 bad smells, organized into five categories, covering problems related to game design and logic, physics, animation, rendering, or multiplayer. Then, we assessed the perceived relevance of such bad smells by surveying 76 game development professionals. The survey respondents agreed with the identified bad smells but also provided us with further insights about the discussed smells. Upon reporting results, we discuss bad smell examples, their consequences, as well as possible mitigation/fixing strategies and trade-offs to be pursued by developers. The catalog can be used not only as a guideline for developers and educators but also can pave the way toward better automated tool support for video game developers.
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt
Biruk Asmare Muse
Csaba Nagy
Anthony Cleve
Giuliano Antoniol
Studying the Practices of Deploying Machine Learning Projects on Docker
Moses Openja
Forough Majidi
Bhagya Chembakottu
Heng Li
Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs
Mohammad Masudur Rahman
Marco Castelluccio
AmbieGen tool at the SBST 2022 Tool Competition
Dmytro Humeniuk
Giuliano Antoniol
AmbieGen is a tool for generating test cases for cyber-physical systems (CPS). In the context of SBST 2022 CPS tool competition, it has been… (voir plus) adapted to generating virtual roads to test a car lane keeping assist system. AmbieGen leverages a two objective NSGA-II algorithm to produce the test cases. It has achieved the highest final score, accounting for the test case efficiency, effectiveness and diversity in both testing configurations.
Challenges in Machine Learning Application Development: An Industrial Experience Report
Md. Saidur Rahman
Emilio Martínez Rivera
Yann‐Gaël Guéhéneuc
Bernd Lehnert
SAP is the market leader in enterprise application software offering an end-to-end suite of applications and services to enable their custom… (voir plus)ers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and those transactions are then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies or anomalies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and might be prone to further errors due to incorrect modifications. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we report on our experience from a project where we develop an AI-based system to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from two distinct perspectives: Software Engineering and Machine Learning. We report on our experience and insights from the project with guidelines for the identified challenges. We collect developers’ feedback for qualitative analysis of our findings. We believe that our findings and recommendations can help other researchers and practitioners embarking into similar endeavours. CCS CONCEPTS • Software and its engineering → Programming teams.
Challenges in Machine Learning Application Development: An Industrial Experience Report
Md Saidur Rahman
Emilio Rivera
Yann‐Gaël Guéhéneuc
Bernd Lehnert
Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy
Mira Marhaba
Ettore Merlo
Giuliano Antoniol
Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (voir plus)alibration
Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy
Mira Marhaba
Ettore Merlo
Giuliano Antoniol
Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (voir plus)alibration
Clones in deep learning code: what, where, and why?
Hadhemi Jebnoun
Md Saidur Rahman
Biruk Asmare Muse