Foutse Khomh

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Houssem Ben Braiek

Postdoctorat - Polytechnique

Doctorat - Polytechnique

Github

forough majidi

Doctorat - Polytechnique

Site web

Khouloud Oueslati

Maîtrise recherche - Polytechnique

Arian Qazvini

Maîtrise recherche - Polytechnique

Site web

Github

Elnathan Tiokou Tiokou Fangang

Maîtrise recherche - Polytechnique

Ben Braiek Yasmine

Maîtrise recherche - Polytechnique

Publications

ReCatcher: Towards LLMs Regression Testing for Code Generation

Altaf Allah Abbassi

Leuson Da Silva

Amin Nikanjam

2025-07-01

arXiv (publié)

Protecting Privacy in Software Logs: What Should Be Anonymized?

Roozbeh Aghili

Heng Li

2025-06-19

Proceedings of the ACM on Software Engineering (publié)

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code

Yang Liu

Armstrong Foundjem

Heng Li

Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As the… (voir plus)ir integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and datasets (e.g., HumanEval, MBPP), establishing a taxonomy of adversarial attacks. The first dimension classifies the input type code, prompts, or comments while the second dimension focuses on granularity: character, word, or sentence-level changes. We adopted a mixed-methods approach, combining quantitative performance metrics with qualitative vulnerability analysis. LLM4Code models show varying robustness across perturbation types. Sentence-level attacks were least effective, suggesting models are resilient to broader contextual changes. In contrast, word-level perturbations posed serious challenges, exposing semantic vulnerabilities. Character-level effects varied, showing model sensitivity to subtle syntactic deviations.Our study offers a structured framework for testing LLM4Code robustness and emphasizes the critical role of natural language in adversarial evaluation. Improving model resilience to semantic-level disruptions is essential for secure and reliable code-generation systems.

2025-06-09

ArXiv (prépublication)

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code

Yang Liu

Armstrong Foundjem

Heng Li

2025-06-01

arXiv (publié)

SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

Roozbeh Aghili

Xingfang Wu

Heng Li

2025-05-20

ArXiv (prépublication)

Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning

Ruchira Manke

Mohammad Wardat

Hridesh Rajan

While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (voir plus)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.

2025-05-06

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) (publié)

Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions

Anthonia Njoku

Heng Li

2025-05-05

Companion of the 16th ACM/SPEC International Conference on Performance Engineering (publié)

SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

Roozbeh Aghili

Xingfang Wu

Heng Li

2025-05-01

arXiv (publié)

JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects

Kaveh Shahedi

Maxime Lamothe

Heng Li

Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (voir plus)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.

2025-04-28

IEEE Working Conference on Mining Software Repositories (publié)

Logging requirement for continuous auditing of responsible machine learning-based applications

Patrick Loic Foalem

Leuson Da Silva

Heng Li

Ettore Merlo

2025-04-14

Empirical Software Engineering (publié)

Leveraging Machine Learning Techniques in Intrusion Detection Systems for Internet of Things

Saeid Jamshidi

Amin Nikanjam

Nafi Kawser Wazed

2025-04-09

ArXiv (prépublication)

Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search

Vahid Majdinasab

Amin Nikanjam

2025-04-07

ArXiv (prépublication)