Portrait of Foutse Khomh

Foutse Khomh

Associate Academic Member
Canada CIFAR AI Chair
Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Data Mining
Deep Learning
Distributed Systems
Generative Models
Learning to Program
Natural Language Processing
Reinforcement Learning

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Postdoctorate - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal

Publications

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects
Kaveh Shahedi
Nana Gyambrah
Heng Li
Maxime Lamothe
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (see more)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.
Tracing Optimization for Performance Modeling and Regression Detection
Kaveh Shahedi
Heng Li
Maxime Lamothe
Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describe… (see more)s the relationship between the performance of a system and its runtime activities. This process typically examines various aspects of a system's runtime behavior, such as the execution frequency of functions or methods, to forecast performance metrics like program execution time. By using performance models, developers can predict expected performance and thereby effectively identify and address unexpected performance regressions when actual performance deviates from the model's predictions. One common and precise method for capturing performance behavior is software tracing, which involves instrumenting the execution of a program, either at the kernel level (e.g., system calls) or application level (e.g., function calls). However, due to the nature of tracing, it can be highly resource-intensive, making it impractical for production environments where resources are limited. In this work, we propose statistical approaches to reduce tracing overhead by identifying and excluding performance-insensitive code regions, particularly application-level functions, from tracing while still building accurate performance models that can capture performance degradations. By selecting an optimal set of functions to be traced, we can construct optimized performance models that achieve an R-2 score of up to 99% and, sometimes, outperform full tracing models (models using non-optimized tracing data), while significantly reducing the tracing overhead by more than 80% in most cases. Our optimized performance models can also capture performance regressions in our studied programs effectively, demonstrating their usefulness in real-world scenarios. Our approach is fully automated, making it ready to be used in production environments with minimal human effort.
A Dynamic Security Pattern Selection Framework Using Deep Reinforcement Learning
Saeid Jamshidi
Amin Nikanjam
Kawser Wazed Nafi
The rapid expansion of the Internet of Things (IoT) has brought transformative benefits across various domains and introduced significant se… (see more)curity challenges, especially in resource-constrained edge gateways. This paper proposes an innovative Intrusion Detection System (IDS) powered by Deep Reinforcement Learning (DRL) to dynamically detect and mitigate network threats by selecting IoT security patterns. Leveraging adaptive IoT security patterns, the system addresses diverse attack scenarios (e.g., Distributed Denial of Service (DDoS), DoS GoldenEye, DoS Hulk, and Port Scanning) with significant efficiency. The system achieves an average detection accuracy of 97% and demonstrates reduced response times and efficient resource utilization, making it well-suited for edge gateways. The experimental evaluations validate the proposed model's ability to enhance security while optimizing CPU and memory usage, reducing energy consumption, and lowering carbon emissions. Furthermore, its adaptability to evolving cyber threats and alignment with green computing principles highlight its potential to support secure and sustainable IoT networks.
Health data issues in Africa: time for digitization, standardization and harmonization
Abdoelnaser Degoot
Ismaël Koné
Shakuntala Baichoo
Mercy Ngungu
Nzisa Liku
Judit Kumuthini
Joyce Nakatumba‐Nabende
Bubacarr Bah
This commentary discusses health data challenges in Africa, focusing on digitization, standardization, and harmonization as key solutions. I… (see more)t highlights how addressing these foundational issues can enable AI and data science to transform healthcare systems across the continent.
ReCatcher: Towards LLMs Regression Testing for Code Generation
Altaf Allah Abbassi
Leuson Da Silva
Amin Nikanjam
An Empirical Study of Sensitive Information in Logs
Roozbeh Aghili
Heng Li
What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
Xingfang Wu
Heng Li
Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications … (see more)and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, existing approaches have not captured the timestamps in the log data, which can potentially provide more fine-grained temporal information than sequential information. In this work, we propose a configurable transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model's features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information in anomaly detection. When presented with log sequences of varying lengths, the model can attain competitive and consistently stable performance compared to the baselines. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.
Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning
Ruchira Manke
Mohammad Wardat
Hridesh Rajan
While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (see more)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.
Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions
Anthonia Njoku
Heng Li
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Heng Li
JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects
Kaveh Shahedi
Maxime Lamothe
Heng Li
Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (see more)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.
Logging Requirement for Continuous Auditing of Responsible Machine Learning-based Applications
Patrick Loic Foalem
Leuson Da Silva
Heng Li
Ettore Merlo
Machine learning (ML) is increasingly applied across industries to automate decision-making, but concerns about ethical and legal compliance… (see more) remain due to limited transparency, fairness, and accountability. Monitoring through logging a long-standing practice in traditional software offers a potential means for auditing ML applications, as logs provide traceable records of system behavior useful for debugging, performance analysis, and continuous auditing. systematically auditing models for compliance or accountability. The findings underscore the need for enhanced logging practices and tooling that systematically integrate responsible AI metrics. Such practices would support the development of auditable, transparent, and ethically responsible ML systems, aligning with growing regulatory requirements and societal expectations. By highlighting specific deficiencies and opportunities, this work provides actionable guidance for both practitioners and tool developers seeking to strengthen the accountability and trustworthiness of ML applications.