Foutse Khomh

Membre académique associé

Chaire en IA Canada-CIFAR

Professeur, Polytechnique Montréal, Département de génie informatique et génie logiciel

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Ahmed Haj Yahmed

Maîtrise recherche - Polytechnique Montréal

ahmed.haj-yahmed@mila.quebec

Github

Google Scholar

Arghavan Moradi Dakhel

Postdoctorat - Polytechnique Montréal

arghavan.moradi-dakhel@mila.quebec

Doctorat - Polytechnique Montréal

gabriel.laberge@mila.quebec

Github

Google Scholar

Khouloud Oueslati

Maîtrise recherche - Polytechnique Montréal

khouloud.oueslati@mila.quebec

Elnathan Tiokou Tiokou Fangang

Doctorat - Polytechnique Montréal

elnathan.tiokou@mila.quebec

Mohammadhossein Malekpour

Maîtrise recherche - Polytechnique Montréal

mohammadhossein.malekpour@mila.quebec

Github

Nanda Assobjio Brice Yvan

Maîtrise recherche - Polytechnique Montréal

brice.nanda@mila.quebec

Google Scholar

Publications

PaReco: patched clones and missed patches among the divergent variants of a software family

Poedjadevie Kadjel Ramkisoen

John Businge

Brent van Bladel

Alexandre Decan

Serge Demeyer

Coen De Roover

Foutse Khomh

Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. Howev… (voir plus)er, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.

2022-11-07

ESEC/SIGSOFT FSE (publié)

doi.org

Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication

Aurel Ikama

Vincent Du

Philippe Belias

Biruk Asmare Muse

Foutse Khomh

Mohammad Hamdaqa

Anti-patterns manifesting on software code through code smells have been investigated in terms of their prevalence, detection, refactoring, … (voir plus)and impact on software quality attributes. In particular, leveraging heuristics to identify fault-fixing commits, Khomh et al. have found that anti-patterns and code smells have an impact on the fault-proneness of a software system. Similarly, Saboury et al. found a relationship between anti-pattern occurrences and fault-proneness, using heuristic to identify fault-fixing commits and fault-inducing changes. However, recent studies question the accuracy of heuristics, and thus the validity of empirical studies that leverage it. Hence, in this work, we would like to investigate to what extent the results of empirical studies using heuristics to identify bug fix commits are affected by the limitations of the heuristics based approach using manually validated bug fix commits as a ground truth. In particular, we conduct a differentiated replication of the work by Khomh et al. We particularly focused on the impact of anti-patterns on fault-proneness as it is the only dependent variable that may be affected by noise in the collected faults data. In our differentiated replication study, (1) we expanded the number of subject systems from 5 to 38, (2) utilized a manually validated dataset of bug-fixing commits from the work of Herbold et al., and (3) answered research questions from Khomh et al., that are related to the relationship between anti-pattern occurrences and fault-proneness. (4) We added an additional research question to investigate if combining results from several heuristic-based approaches could help reduce the impact of noise. Our findings show that the impact of the noise generated by the automatic algorithm heuristic based is negligible for the studied subject systems; meaning that the reported relation observed on noisy data still holds on the clean data. However, we also observed that combining results from several heuristic based approaches do not reduce this noise, quite the contrary.

2022-10-03

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) (publié)

doi.org

Video Game Bad Smells: What They Are and How Developers Perceive Them

Vittoria Nardone

Biruk Asmare Muse

Mouna Abidi

Foutse Khomh

Massimiliano Di Penta

Video games represent a substantial and increasing share of the software market. However, their development is particularly challenging as i… (voir plus)t requires multi-faceted knowledge, which is not consolidated in computer science education yet. This article aims at defining a catalog of bad smells related to video game development. To achieve this goal, we mined discussions on general-purpose and video game-specific forums. After querying such a forum, we adopted an open coding strategy on a statistically significant sample of 572 discussions, stratified over different forums. As a result, we obtained a catalog of 28 bad smells, organized into five categories, covering problems related to game design and logic, physics, animation, rendering, or multiplayer. Then, we assessed the perceived relevance of such bad smells by surveying 76 game development professionals. The survey respondents agreed with the identified bad smells but also provided us with further insights about the discussed smells. Upon reporting results, we discuss bad smell examples, their consequences, as well as possible mitigation/fixing strategies and trade-offs to be pursued by developers. The catalog can be used not only as a guideline for developers and educators but also can pave the way toward better automated tool support for video game developers.

2022-09-15

ACM Transactions on Software Engineering and Methodology (publié)

doi.org

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Biruk Asmare Muse

Csaba Nagy

Anthony Cleve

Foutse Khomh

Giuliano Antoniol

2022-07-08

Empirical Software Engineering (publié)

doi.org

arxiv.org

Studying the Practices of Deploying Machine Learning Projects on Docker

Moses Openja

Forough Majidi

Foutse Khomh

Bhagya Chembakottu

Heng Li

2022-06-13

The International Conference on Evaluation and Assessment in Software Engineering 2022 (publié)

doi.org

arxiv.org

Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs

Mohammad Masudur Rahman

Foutse Khomh

Marco Castelluccio

2022-05-30

Empirical Software Engineering (publié)

doi.org

AmbieGen tool at the SBST 2022 Tool Competition

Dmytro Humeniuk

Giuliano Antoniol

Foutse Khomh

AmbieGen is a tool for generating test cases for cyber-physical systems (CPS). In the context of SBST 2022 CPS tool competition, it has been… (voir plus) adapted to generating virtual roads to test a car lane keeping assist system. AmbieGen leverages a two objective NSGA-II algorithm to produce the test cases. It has achieved the highest final score, accounting for the test case efficiency, effectiveness and diversity in both testing configurations.

2022-05-01

International Workshop on Search-Based Software Testing (published)

doi.org

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md. Saidur Rahman

Foutse Khomh

Emilio Martínez Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

SAP is the market leader in enterprise application software offering an end-to-end suite of applications and services to enable their custom… (voir plus)ers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and those transactions are then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies or anomalies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and might be prone to further errors due to incorrect modifications. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we report on our experience from a project where we develop an AI-based system to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from two distinct perspectives: Software Engineering and Machine Learning. We report on our experience and insights from the project with guidelines for the identified challenges. We collect developers’ feedback for qualitative analysis of our findings. We believe that our findings and recommendations can help other researchers and practitioners embarking into similar endeavours. CCS CONCEPTS • Software and its engineering → Programming teams.

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (published)

doi.org

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md Saidur Rahman

Foutse Khomh

Emilio Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (publié)

doi.org

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Foutse Khomh

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (voir plus)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (published)

doi.org

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Foutse Khomh

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (voir plus)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (publié)

doi.org

Clones in deep learning code: what, where, and why?

Hadhemi Jebnoun

Md Saidur Rahman

Foutse Khomh

Biruk Asmare Muse

2022-04-08

Empirical Software Engineering (publié)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Foutse Khomh

Biographie

Étudiants actuels

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Foutse Khomh

Biographie

Étudiants actuels

Publications