Portrait of Foutse Khomh

Foutse Khomh

Associate Academic Member
Canada CIFAR AI Chair
Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Data Mining
Deep Learning
Distributed Systems
Generative Models
Learning to Program
Natural Language Processing
Reinforcement Learning

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Collaborating Alumni - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Postdoctorate - Polytechnique Montréal
Co-supervisor :
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal

Publications

From Technical Excellence to Practical Adoption: Lessons Learned Building an ML-Enhanced Trace Analysis Tool
Kaveh Shahedi
Matthew Khouzam
Heng Li
Maxime Lamothe
ReCatcher: Towards LLMs Regression Testing for Code Generation
Altaf Allah Abbassi
Leuson Da Silva
Amin Nikanjam
Large Language Models (LLMs) for code generation evolve rapidly through fine-tuning, merging, or new model releases. However, such updates c… (see more)an introduce regressions, not only in correctness but also in code quality and performance. To address this, we present ReCatcher, a regression testing framework for Python code generation. ReCatcher systematically compares two LLMs, typically a current model and a candidate update, across three dimensions: logical correctness, static code quality, and execution performance. We apply ReCatcher to assess regressions across three update scenarios, fine-tuning, merging, and model release, using CodeLlama, DeepSeek-Coder, and GPT-4o. Our evaluation shows that fine-tuning with cross-language datasets increases syntax errors by up to 12%. Merging with general-purpose models like Llama2 leads to regressions in correctness by up to 18%. GPT-4o introduces regressions of up to 50% in handling missing imports compared to GPT-3.5-turbo, while GPT-4o-mini suffers up to 80% performance degradation in execution time versus GPT-4o. Overall, logical correctness, performance, and error handling (e.g., syntax errors and missing imports) are the most regression-prone areas. Comparing ReCatcher with baseline solutions, it presents better and consistent accuracy across logical and performance aspects. ReCatcher highlights the importance of systematic regression evaluation before adopting new models, while assisting researchers and practitioners in making more informed update decisions.
A Dynamic Security Pattern Selection Framework Using Deep Reinforcement Learning
Saeid Jamshidi
Amin Nikanjam
Kawser Wazed Nafi
The rapid expansion of the Internet of Things (IoT) has brought transformative benefits across various domains and introduced significant se… (see more)curity challenges, especially in resource-constrained edge gateways. This paper proposes an innovative Intrusion Detection System (IDS) powered by Deep Reinforcement Learning (DRL) to dynamically detect and mitigate network threats by selecting IoT security patterns. Leveraging adaptive IoT security patterns, the system addresses diverse attack scenarios (e.g., Distributed Denial of Service (DDoS), DoS GoldenEye, DoS Hulk, and Port Scanning) with significant efficiency. The system achieves an average detection accuracy of 97% and demonstrates reduced response times and efficient resource utilization, making it well-suited for edge gateways. The experimental evaluations validate the proposed model's ability to enhance security while optimizing CPU and memory usage, reducing energy consumption, and lowering carbon emissions. Furthermore, its adaptability to evolving cyber threats and alignment with green computing principles highlight its potential to support secure and sustainable IoT networks.
Health data issues in Africa: time for digitization, standardization and harmonization
Abdoelnaser Degoot
Ismaël Koné
Shakuntala Baichoo
Mercy Ngungu
Nzisa Liku
Judit Kumuthini
Joyce Nakatumba-Nabende
Bubacarr Bah
LLMs and Stack Overflow discussions: Reliability, impact, and challenges
Leuson Da Silva
Jordan Samhi
LLMs and Stack Overflow discussions: Reliability, impact, and challenges
Leuson Da Silva
Jordan Samhi
ReCatcher: Towards LLMs Regression Testing for Code Generation
Altaf Allah Abbassi
Leuson Da Silva
Amin Nikanjam
Protecting Privacy in Software Logs: What Should Be Anonymized?
Roozbeh Aghili
Heng Li
Adversarial Attack Classification and Robustness Testing for Large Language Models for Code
Yang Liu
Armstrong Foundjem
Heng Li
Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As the… (see more)ir integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and datasets (e.g., HumanEval, MBPP), establishing a taxonomy of adversarial attacks. The first dimension classifies the input type code, prompts, or comments while the second dimension focuses on granularity: character, word, or sentence-level changes. We adopted a mixed-methods approach, combining quantitative performance metrics with qualitative vulnerability analysis. LLM4Code models show varying robustness across perturbation types. Sentence-level attacks were least effective, suggesting models are resilient to broader contextual changes. In contrast, word-level perturbations posed serious challenges, exposing semantic vulnerabilities. Character-level effects varied, showing model sensitivity to subtle syntactic deviations.Our study offers a structured framework for testing LLM4Code robustness and emphasizes the critical role of natural language in adversarial evaluation. Improving model resilience to semantic-level disruptions is essential for secure and reliable code-generation systems.
Adversarial Attack Classification and Robustness Testing for Large Language Models for Code
Yang Liu
Armstrong Foundjem
Heng Li
Large Language Models (LLMs) have become vital tools in software development tasks such as code generation, completion, and analysis. As the… (see more)ir integration into workflows deepens, ensuring robustness against vulnerabilities especially those triggered by diverse or adversarial inputs becomes increasingly important. Such vulnerabilities may lead to incorrect or insecure code generation when models encounter perturbed task descriptions, code, or comments. Prior research often overlooks the role of natural language in guiding code tasks. This study investigates how adversarial perturbations in natural language inputs including prompts, comments, and descriptions affect LLMs for Code (LLM4Code). It examines the effects of perturbations at the character, word, and sentence levels to identify the most impactful vulnerabilities. We analyzed multiple projects (e.g., ReCode, OpenAttack) and datasets (e.g., HumanEval, MBPP), establishing a taxonomy of adversarial attacks. The first dimension classifies the input type code, prompts, or comments while the second dimension focuses on granularity: character, word, or sentence-level changes. We adopted a mixed-methods approach, combining quantitative performance metrics with qualitative vulnerability analysis. LLM4Code models show varying robustness across perturbation types. Sentence-level attacks were least effective, suggesting models are resilient to broader contextual changes. In contrast, word-level perturbations posed serious challenges, exposing semantic vulnerabilities. Character-level effects varied, showing model sensitivity to subtle syntactic deviations.Our study offers a structured framework for testing LLM4Code robustness and emphasizes the critical role of natural language in adversarial evaluation. Improving model resilience to semantic-level disruptions is essential for secure and reliable code-generation systems.
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Heng Li
Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning
Ruchira Manke
Mohammad Wardat
Hridesh Rajan
While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (see more)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.