Portrait of Foutse Khomh

Foutse Khomh

Associate Academic Member
Canada CIFAR AI Chair
Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Data Mining
Deep Learning
Distributed Systems
Generative Models
Learning to Program
Natural Language Processing
Reinforcement Learning

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Collaborating Alumni - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Postdoctorate - Polytechnique Montréal
Co-supervisor :
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal
Master's Research - Polytechnique Montréal

Publications

The role of Large Language Models in IoT security: A systematic review of advances, challenges, and opportunities
Saeid Jamshidi
Negar Shahabi
Amin Nikanjam
Kawser Wazed Nafi
Carol Fung
DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions
Yinghang Ma
Jiho Shin
Leuson Da Silva
Zhen Ming (Jack) Jiang
Song Wang
Shin Hwei Tan
Recent advances in large language models (LLMs) have accelerated the development of AI-driven automated program repair (APR) solutions. Howe… (see more)ver, these solutions are typically evaluated using static benchmarks such as Defects4J and SWE-bench, which suffer from two key limitations: (1) the risk of data contamination, potentially inflating evaluation results due to overlap with LLM training data, and (2) limited ability to assess the APR capabilities in dynamic and diverse contexts. In this paper, we introduced BloomAPR, a novel dynamic evaluation framework grounded in Bloom's Taxonomy. Our framework offers a structured approach to assess the cognitive capabilities of LLM-powered APR solutions across progressively complex reasoning levels. Using Defects4J as a case study, we evaluated two state-of-the-art LLM-powered APR solutions, ChatRepair and CigaR, under three different LLMs: GPT-3.5-Turbo, Llama-3.1, and StarCoder-2. Our findings show that while these solutions exhibit basic reasoning skills and effectively memorize bug-fixing patterns (fixing up to 81.57% of bugs at the Remember layer), their performance increases with synthetically generated bugs (up to 60.66% increase at the Understand layer). However, they perform worse on minor syntactic changes (fixing up to 43.32% at the Apply layer), and they struggle to repair similar bugs when injected into real-world projects (solving only 13.46% to 41.34% bugs at the Analyze layer). These results underscore the urgent need for evolving benchmarks and provide a foundation for more trustworthy evaluation of LLM-powered software engineering solutions.
Correction to: Assessing the adoption of security policies by developers in terraform across different cloud providers
Alexandre Verdet
Mohammad Hamdaqa
Leuson Da Silva
Correction to: Assessing the adoption of security policies by developers in terraform across different cloud providers
Alexandre Verdet
Mohammad Hamdaqa
Leuson Da Silva
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks
Moses Openja
Paolo Arcaini
Fuyuki Ishikawa
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks
Moses Openja
Paolo Arcaini
Fuyuki Ishikawa
Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications tha… (see more)t impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks
Moses Openja
Paolo Arcaini
Fuyuki Ishikawa
Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications tha… (see more)t impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep , an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.
An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects
Kaveh Shahedi
Nana Gyambrah
Heng Li
Maxime Lamothe
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (see more)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.
From Technical Excellence to Practical Adoption: Lessons Learned Building an ML-Enhanced Trace Analysis Tool
Kaveh Shahedi
Matthew Khouzam
Heng Li
Maxime Lamothe
System tracing has become essential for understanding complex software behavior in modern systems, yet sophisticated trace analysis tools fa… (see more)ce significant adoption gaps in industrial settings. Through a year-long collaboration with Ericsson Montr\'eal, developing TMLL (Trace-Server Machine Learning Library, now in the Eclipse Foundation), we investigated barriers to trace analysis adoption. Contrary to assumptions about complexity or automation needs, practitioners struggled with translating expert knowledge into actionable insights, integrating analysis into their workflows, and trusting automated results they could not validate. We identified what we called the Excellence Paradox: technical excellence can actively impede adoption when conflicting with usability, transparency, and practitioner trust. TMLL addresses this through adoption-focused design that embeds expert knowledge in interfaces, provides transparent explanations, and enables incremental adoption. Validation through Ericsson's experts'feedback, Eclipse Foundation's integration, and a survey of 40 industry and academic professionals revealed consistent patterns: survey results showed that 77.5% prioritize quality and trust in results over technical sophistication, while 67.5% prefer semi-automated analysis with user control, findings supported by qualitative feedback from industrial collaboration and external peer review. Results validate three core principles: cognitive compatibility, embedded expertise, and transparency-based trust. This challenges conventional capability-focused tool development, demonstrating that sustainable adoption requires reorientation toward adoption-focused design with actionable implications for automated software engineering tools.
An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects
Kaveh Shahedi
Nana Gyambrah
Heng Li
Maxime Lamothe
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution re… (see more)mains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.