Foutse Khomh

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Houssem Ben Braiek

Postdoctorat - Polytechnique

Doctorat - Polytechnique

Github

forough majidi

Doctorat - Polytechnique

Site web

Khouloud Oueslati

Maîtrise recherche - Polytechnique

Arian Qazvini

Maîtrise recherche - Polytechnique

Site web

Github

Elnathan Tiokou Tiokou Fangang

Maîtrise recherche - Polytechnique

Ben Braiek Yasmine

Maîtrise recherche - Polytechnique

Publications

Refactoring practices in the context of data-intensive systems

Biruk Asmare Muse

Giuliano Antoniol

2023-02-15

Empirical Software Engineering (publié)

AmbieGen tool at the SBST 2022 Tool Competition

Dmytro Humeniuk

Giuliano Antoniol

AmbieGen is a tool for generating test cases for cyber-physical systems (CPS). In the context of SBST 2022 CPS tool competition, it has been… (voir plus) adapted to generating virtual roads to test a car lane keeping assist system. AmbieGen leverages a two objective NSGA-II algorithm to produce the test cases. It has achieved the highest final score, accounting for the test case efficiency, effectiveness and diversity in both testing configurations.

2023-02-02

Proceedings of the 15th Workshop on Search-Based Software Testing (publié)

Fooling SHAP with Stealthily Biased Sampling

Gabriel Laberge

Ulrich Matchi Aïvodji

Satoshi Hara

Mario Marchand

SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a speciﬁc input versus a … (voir plus)background distribution. Recent studies have shown that they can be manipulated by malicious adversaries to produce arbitrary desired explanations. However, existing attacks focus solely on altering the black-box model itself. In this paper, we propose a complementary family of attacks that leave the model intact and manipulate SHAP explanations using stealthily biased sampling of the data points used to approximate expectations w.r.t the background distribution. In the context of fairness audit, we show that our attack can reduce the importance of a sensitive feature when explaining the difference in outcomes between groups, while remaining undetected. These results highlight the manipulability of SHAP explanations and encourage auditors to treat post-hoc explanations with skepticism.

2023-01-31

ICLR.cc/2023/Conference (poster)

openreview.net

Studying Logging Practice in Machine Learning-based Applications

Patrick Loic Foalem

Heng Li

Logging is a common practice in traditional software development. Several research works have been done to investigate the different charact… (voir plus)eristics of logging practices in traditional software systems (e.g., Android applications, JAVA applications, C/C++ applications). Nowadays, we are witnessing more and more development of Machine Learning-based applications (ML-based applications). Today, there are many popular libraries that facilitate and contribute to the development of such applications, among which we can mention: Pytorch, Tensorflow, Theano, MXNet, Scikit-Learn, Caffe, and Keras. Despite the popularity of ML, we don't have a clear understanding of logging practices in ML applications. In this paper, we aim to fill this knowledge gap and help ML practitioners understand the characteristics of logging in ML-based applications. In particular, we conduct an empirical study on 110 open-source ML-based applications. Through a quantitative analysis, we find that logging practice in ML-based applications is less pervasive than in traditional applications including Android, JAVA, and C/C++ applications. Furthermore, the majority of logging statements in ML-based applications are in info and warn levels, compared to traditional applications where info is the majority of logging statement in C/C++ application and debug, error levels constitute the majority of logging statement in Android application. We also perform a quantitative and qualitative analysis of a random sample of logging statements to understand where ML developers put most of logging statements and examine why and how they are using logging. These analyses led to the following observations: (i) ML developers put most of the logging statements in model training, and in non-ML components. (ii) Data and model management appear to be the main reason behind the introduction of logging statements in ML-based applications.

2023-01-09

ArXiv (prépublication)

SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design

Houssem Ben Braiek

Ali Tfaily

Thomas Reid

Ciro Guida

2023-01-04

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (publié)

AmbieGen: A Search-based Framework for Autonomous Systems Testing

Dmytro Humeniuk

Giuliano Antoniol

2022-12-31

Sci. Comput. Program. (publié)

Can Ensembling Pre-processing Algorithms Lead to Better Machine Learning Fairness?

Khaled Badran

Pierre-Olivier Côté

Amanda Kolopanis

Rached Bouchoucha

Antonio Collante

Diego Elias Costa

Emad Shihab

As machine learning (ML) systems get adopted in more critical areas, it has become increasingly crucial to address the bias that could occur… (voir plus) in these systems. Several fairness pre-processing algorithms are available to alleviate implicit biases during model training. These algorithms employ different concepts of fairness, often leading to conflicting strategies with consequential trade-offs between fairness and accuracy. In this work, we evaluate three popular fairness pre-processing algorithms and investigate the potential for combining all algorithms into a more robust pre-processing ensemble. We report on lessons learned that can help practitioners better select fairness algorithms for their models.

2022-12-31

Computer (publié)

Effective test generation using pre-trained Large Language Models and mutation testing

Arghavan Moradi Dakhel

Amin Nikanjam

Vahid Majdinasab

Michel C. Desmarais

One of the critical phases in software development is software testing. Testing helps with identifying potential bugs and reducing maintenan… (voir plus)ce costs. The goal of automated test generation tools is to ease the development of tests by suggesting efficient bug-revealing tests. Recently, researchers have leveraged Large Language Models (LLMs) of code to generate unit tests. While the code coverage of generated tests was usually assessed, the literature has acknowledged that the coverage is weakly correlated with the efficiency of tests in bug detection. To improve over this limitation, in this paper, we introduce MuTAP for improving the effectiveness of test cases generated by LLMs in terms of revealing bugs by leveraging mutation testing. Our goal is achieved by augmenting prompts with surviving mutants, as those mutants highlight the limitations of test cases in detecting bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs). We employ different LLMs within MuTAP and evaluate their performance on different benchmarks. Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets. Among these, 17% remained undetected by both the current state-of-the-art fully automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning approaches on LLMs. Furthermore, MuTAP achieves a Mutation Score (MS) of 93.57% on synthetic buggy code, outperforming all other approaches in our evaluation. Our findings suggest that although LLMs can serve as a useful tool to generate test cases, they require specific post-processing steps to enhance the effectiveness of the generated test cases which may suffer from syntactic or functional errors and may be ineffective in detecting certain types of bugs and testing corner cases PUTs.

2022-12-31

arXiv (prépublication)

GitHub Copilot AI pair programmer: Asset or Liability?

Arghavan Moradi Dakhel

Vahid Majdinasab

Amin Nikanjam

Michel C. Desmarais

Z. Jiang

Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called… (voir plus) Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans' solutions is greater than Copilot's suggestions, while the buggy solutions generated by Copilot require less effort to be repaired.

2022-12-31

J. Syst. Softw. (publié)

Impact in Software Engineering Activities After One Year of COVID-19 Restrictions for Startups and Established Companies

Hosna Hooshyar

Eduardo Guerra

Jorge Melegati

Dron Khanna

Abdullah Aldaeej

Gerardo Matturro

Luciana Zaina

Des Greer

Usman Rafiq

Rafael Chanin

Xiaofeng Wang

Juan Garbajosa

Pekka Abrahamsson

Anh Nguyen-Duc

The restrictions imposed by the COVID-19 pandemic required software development teams to adapt, being forced to work remotely and adjust the… (voir plus) software engineering activities accordingly. In the studies evaluating these effects, a few have assessed the impact on software engineering activities from a broader perspective and after a period of time when teams had time to adjust to the changes. No studies have been found comparing software startups and established companies either. This paper aims to investigate the impacts of COVID-19 on software development activities after one year of the pandemic restrictions, comparing the results between startups and established companies. Our approach was to design a cross-sectional survey and distribute it online among software development companies worldwide. The participants were asked about their perception of COVID-19’s pandemic impact on different software engineering activities: requirements engineering, software architecture, user experience design, software implementation, and software quality assurance. The survey received 170 valid answers from 29 countries, and for all the software engineering activities, we found that most respondents did not observe a significant impact. The results also showed that software startups and established companies were affected differently since, in some activities, we found a negative impact in the former and a positive impact in the latter. Regarding the time spent on each software engineering activity, most of the answers reported no change, but on those that did, the result points to an increase in time. Thus, we cannot find any relation between the change in time of effort and the reported positive or negative impact.

2022-12-31

IEEE Access (publié)

An Intentional Forgetting-Driven Self-Healing Method for Deep Reinforcement Learning Systems

Ahmed Haj Yahmed

Rached Bouchoucha

Houssem Ben Braiek

Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven sys… (voir plus)tems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL (i.e., standard CL) to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.

2022-12-31

2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) (publié)

Maintenance Cost of Software Ecosystem Updates

Solomon Berhe

M. Maynard

2022-12-31

ANT/EDI40 (publié)