Foutse Khomh

Associate Academic Member

Canada CIFAR AI Chair

Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Ahmed Haj Yahmed

Master's Research - Polytechnique Montréal

ahmed.haj-yahmed@mila.quebec

Github

Google Scholar

Arghavan Moradi Dakhel

Postdoctorate - Polytechnique Montréal

arghavan.moradi-dakhel@mila.quebec

PhD - Polytechnique Montréal

gabriel.laberge@mila.quebec

Github

Google Scholar

Khouloud Oueslati

Master's Research - Polytechnique Montréal

khouloud.oueslati@mila.quebec

Elnathan Tiokou Tiokou Fangang

PhD - Polytechnique Montréal

elnathan.tiokou@mila.quebec

Mohammadhossein Malekpour

Master's Research - Polytechnique Montréal

mohammadhossein.malekpour@mila.quebec

Github

Nanda Assobjio Brice Yvan

Master's Research - Polytechnique Montréal

brice.nanda@mila.quebec

Google Scholar

Publications

PaReco: patched clones and missed patches among the divergent variants of a software family

Poedjadevie Kadjel Ramkisoen

John Businge

Brent van Bladel

Alexandre Decan

Serge Demeyer

Coen De Roover

Foutse Khomh

Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. Howev… (see more)er, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.

2022-11-07

ESEC/SIGSOFT FSE (published)

doi.org

Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication

Aurel Ikama

Vincent Du

Philippe Belias

Biruk Asmare Muse

Foutse Khomh

Mohammad Hamdaqa

Anti-patterns manifesting on software code through code smells have been investigated in terms of their prevalence, detection, refactoring, … (see more)and impact on software quality attributes. In particular, leveraging heuristics to identify fault-fixing commits, Khomh et al. have found that anti-patterns and code smells have an impact on the fault-proneness of a software system. Similarly, Saboury et al. found a relationship between anti-pattern occurrences and fault-proneness, using heuristic to identify fault-fixing commits and fault-inducing changes. However, recent studies question the accuracy of heuristics, and thus the validity of empirical studies that leverage it. Hence, in this work, we would like to investigate to what extent the results of empirical studies using heuristics to identify bug fix commits are affected by the limitations of the heuristics based approach using manually validated bug fix commits as a ground truth. In particular, we conduct a differentiated replication of the work by Khomh et al. We particularly focused on the impact of anti-patterns on fault-proneness as it is the only dependent variable that may be affected by noise in the collected faults data. In our differentiated replication study, (1) we expanded the number of subject systems from 5 to 38, (2) utilized a manually validated dataset of bug-fixing commits from the work of Herbold et al., and (3) answered research questions from Khomh et al., that are related to the relationship between anti-pattern occurrences and fault-proneness. (4) We added an additional research question to investigate if combining results from several heuristic-based approaches could help reduce the impact of noise. Our findings show that the impact of the noise generated by the automatic algorithm heuristic based is negligible for the studied subject systems; meaning that the reported relation observed on noisy data still holds on the clean data. However, we also observed that combining results from several heuristic based approaches do not reduce this noise, quite the contrary.

2022-10-03

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) (published)

doi.org

Video Game Bad Smells: What They Are and How Developers Perceive Them

Vittoria Nardone

Biruk Asmare Muse

Mouna Abidi

Foutse Khomh

Massimiliano Di Penta

Video games represent a substantial and increasing share of the software market. However, their development is particularly challenging as i… (see more)t requires multi-faceted knowledge, which is not consolidated in computer science education yet. This article aims at defining a catalog of bad smells related to video game development. To achieve this goal, we mined discussions on general-purpose and video game-specific forums. After querying such a forum, we adopted an open coding strategy on a statistically significant sample of 572 discussions, stratified over different forums. As a result, we obtained a catalog of 28 bad smells, organized into five categories, covering problems related to game design and logic, physics, animation, rendering, or multiplayer. Then, we assessed the perceived relevance of such bad smells by surveying 76 game development professionals. The survey respondents agreed with the identified bad smells but also provided us with further insights about the discussed smells. Upon reporting results, we discuss bad smell examples, their consequences, as well as possible mitigation/fixing strategies and trade-offs to be pursued by developers. The catalog can be used not only as a guideline for developers and educators but also can pave the way toward better automated tool support for video game developers.

2022-09-15

ACM Transactions on Software Engineering and Methodology (published)

doi.org

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Biruk Asmare Muse

Csaba Nagy

Anthony Cleve

Foutse Khomh

Giuliano Antoniol

2022-07-08

Empirical Software Engineering (published)

doi.org

arxiv.org

Studying the Practices of Deploying Machine Learning Projects on Docker

Moses Openja

Forough Majidi

Foutse Khomh

Bhagya Chembakottu

Heng Li

2022-06-13

The International Conference on Evaluation and Assessment in Software Engineering 2022 (published)

doi.org

arxiv.org

Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs

Mohammad Masudur Rahman

Foutse Khomh

Marco Castelluccio

2022-05-30

Empirical Software Engineering (published)

doi.org

AmbieGen tool at the SBST 2022 Tool Competition

Dmytro Humeniuk

Giuliano Antoniol

Foutse Khomh

AmbieGen is a tool for generating test cases for cyber-physical systems (CPS). In the context of SBST 2022 CPS tool competition, it has been… (see more) adapted to generating virtual roads to test a car lane keeping assist system. AmbieGen leverages a two objective NSGA-II algorithm to produce the test cases. It has achieved the highest final score, accounting for the test case efficiency, effectiveness and diversity in both testing configurations.

2022-05-01

International Workshop on Search-Based Software Testing (published)

doi.org

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md Saidur Rahman

Foutse Khomh

Emilio Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (published)

doi.org

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md. Saidur Rahman

Foutse Khomh

Emilio Martínez Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

SAP is the market leader in enterprise application software offering an end-to-end suite of applications and services to enable their custom… (see more)ers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and those transactions are then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies or anomalies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and might be prone to further errors due to incorrect modifications. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we report on our experience from a project where we develop an AI-based system to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from two distinct perspectives: Software Engineering and Machine Learning. We report on our experience and insights from the project with guidelines for the identified challenges. We collect developers’ feedback for qualitative analysis of our findings. We believe that our findings and recommendations can help other researchers and practitioners embarking into similar endeavours. CCS CONCEPTS • Software and its engineering → Programming teams.

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (published)

doi.org

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Foutse Khomh

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (see more)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (published)

doi.org

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Foutse Khomh

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (see more)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (published)

doi.org

Clones in deep learning code: what, where, and why?

Hadhemi Jebnoun

Md Saidur Rahman

Foutse Khomh

Biruk Asmare Muse

2022-04-08

Empirical Software Engineering (published)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Foutse Khomh

Biography

Current Students

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Foutse Khomh

Biography

Current Students

Publications