Foutse Khomh

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Nanda Assobjio Brice Yvan

Collaborating Alumni - Polytechnique Montréal

Gabriel Laberge

PhD - Polytechnique Montréal

Github

forough majidi

PhD - Polytechnique Montréal

Website

mohammadhossein.malekpour@gmail.com

Mo Malekpour

Master's Research - Polytechnique Montréal

Website

Github

Elnathan Tiokou Tiokou Fangang

Mohamed Amine Merzouk

Postdoctorate - Polytechnique Montréal

Co-supervisor :

Master's Research - Polytechnique Montréal

Master's Research - Polytechnique Montréal

Github

Ben Braiek Yasmine

Master's Research - Polytechnique Montréal

Publications

Health data issues in Africa: time for digitization, standardization and harmonization

Abdoelnaser Degoot

Ismaël Koné

Shakuntala Baichoo

Mercy Ngungu

Nzisa Liku

Judit Kumuthini

Joyce Nakatumba-Nabende

Bubacarr Bah

2025-07-01

Nature Communications (published)

LLMs and Stack Overflow discussions: Reliability, impact, and challenges

Leuson Da Silva

Jordan Samhi

2025-07-01

Journal of Systems and Software (published)

Protecting Privacy in Software Logs: What Should Be Anonymized?

Roozbeh Aghili

Heng Li

2025-06-19

Proceedings of the ACM on Software Engineering (published)

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code

Yang Liu

Armstrong Foundjem

Heng Li

Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As the… (see more)ir integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and datasets (e.g., HumanEval, MBPP), establishing a taxonomy of adversarial attacks. The first dimension classifies the input type code, prompts, or comments while the second dimension focuses on granularity: character, word, or sentence-level changes. We adopted a mixed-methods approach, combining quantitative performance metrics with qualitative vulnerability analysis. LLM4Code models show varying robustness across perturbation types. Sentence-level attacks were least effective, suggesting models are resilient to broader contextual changes. In contrast, word-level perturbations posed serious challenges, exposing semantic vulnerabilities. Character-level effects varied, showing model sensitivity to subtle syntactic deviations.Our study offers a structured framework for testing LLM4Code robustness and emphasizes the critical role of natural language in adversarial evaluation. Improving model resilience to semantic-level disruptions is essential for secure and reliable code-generation systems.

2025-06-09

ArXiv (preprint)

Adversarial Attack Classification and Robustness Testing for Large Language Models for Code

Yang Liu

Armstrong Foundjem

Heng Li

2025-06-01

arXiv (published)

SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

Roozbeh Aghili

Xingfang Wu

Heng Li

2025-05-20

ArXiv (preprint)

Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning

Ruchira Manke

Mohammad Wardat

Hridesh Rajan

While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (see more)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.

2025-05-06

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) (published)

Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions

Anthonia Njoku

Heng Li

2025-05-05

Companion of the 16th ACM/SPEC International Conference on Performance Engineering (published)

SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

Roozbeh Aghili

Xingfang Wu

Heng Li

2025-05-01

arXiv (published)

JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects

Kaveh Shahedi

Maxime Lamothe

Heng Li

Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (see more)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.

2025-04-28

IEEE Working Conference on Mining Software Repositories (published)

Logging requirement for continuous auditing of responsible machine learning-based applications

Patrick Loic Foalem

Leuson Da Silva

Heng Li

Ettore Merlo

2025-04-14

Empirical Software Engineering (published)

Leveraging Machine Learning Techniques in Intrusion Detection Systems for Internet of Things

Saeid Jamshidi

Amin Nikanjam

Nafi Kawser Wazed

2025-04-09

ArXiv (preprint)