Portrait de Foutse Khomh

Foutse Khomh

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur, Polytechnique Montréal, Département de génie informatique et génie logiciel
Sujets de recherche
Apprentissage de la programmation
Apprentissage par renforcement
Apprentissage profond
Exploration des données
Modèles génératifs
Systèmes distribués
Traitement du langage naturel

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Doctorat - Polytechnique
Doctorat - Polytechnique
Maîtrise recherche - Polytechnique
Postdoctorat - Polytechnique
Co-superviseur⋅e :
Maîtrise recherche - Polytechnique
Maîtrise recherche - Polytechnique
Maîtrise recherche - Polytechnique

Publications

The role of Large Language Models in IoT security: A systematic review of advances, challenges, and opportunities
Saeid Jamshidi
Negar Shahabi
Amin Nikanjam
Kawser Wazed Nafi
Carol Fung
Refactoring with LLMs: Bridging Human Expertise and Machine Understanding
Yonnel Chen Kuang Piao
Jean Carlors Paul
Leuson Da Silva
Mohammad Hamdaqa
Refactoring with LLMs: Bridging Human Expertise and Machine Understanding
Yonnel Chen Kuang Piao
Jean Carlors Paul
Leuson Da Silva
Mohammad Hamdaqa
Code refactoring is a fundamental software engineering practice aimed at improving code quality and maintainability. Despite its importance,… (voir plus) developers often neglect refactoring due to the significant time, effort, and resources it requires, as well as the lack of immediate functional rewards. Although several automated refactoring tools have been proposed, they remain limited in supporting a broad spectrum of refactoring types. In this study, we explore whether instruction strategies inspired by human best-practice guidelines can enhance the ability of Large Language Models (LLMs) to perform diverse refactoring tasks automatically. Leveraging the instruction-following and code comprehension capabilities of state-of-the-art LLMs (e.g., GPT-mini and DeepSeek-V3), we draw on Martin Fowler's refactoring guidelines to design multiple instruction strategies that encode motivations, procedural steps, and transformation objectives for 61 well-known refactoring types. We evaluate these strategies on benchmark examples and real-world code snippets from GitHub projects. Our results show that instruction designs grounded in Fowler's guidelines enable LLMs to successfully perform all benchmark refactoring types and preserve program semantics in real-world settings, an essential criterion for effective refactoring. Moreover, while descriptive instructions are more interpretable to humans, our results show that rule-based instructions often lead to better performance in specific scenarios. Interestingly, allowing models to focus on the overall goal of refactoring, rather than prescribing a fixed transformation type, can yield even greater improvements in code quality.
DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions
Yinghang Ma
Jiho Shin
Leuson Da Silva
Zhen Ming (Jack) Jiang
Song Wang
Shin Hwei Tan
Recent advances in large language models (LLMs) have accelerated the development of AI-driven automated program repair (APR) solutions. Howe… (voir plus)ver, these solutions are typically evaluated using static benchmarks such as Defects4J and SWE-bench, which suffer from two key limitations: (1) the risk of data contamination, potentially inflating evaluation results due to overlap with LLM training data, and (2) limited ability to assess the APR capabilities in dynamic and diverse contexts. In this paper, we introduced BloomAPR, a novel dynamic evaluation framework grounded in Bloom's Taxonomy. Our framework offers a structured approach to assess the cognitive capabilities of LLM-powered APR solutions across progressively complex reasoning levels. Using Defects4J as a case study, we evaluated two state-of-the-art LLM-powered APR solutions, ChatRepair and CigaR, under three different LLMs: GPT-3.5-Turbo, Llama-3.1, and StarCoder-2. Our findings show that while these solutions exhibit basic reasoning skills and effectively memorize bug-fixing patterns (fixing up to 81.57% of bugs at the Remember layer), their performance increases with synthetically generated bugs (up to 60.66% increase at the Understand layer). However, they perform worse on minor syntactic changes (fixing up to 43.32% at the Apply layer), and they struggle to repair similar bugs when injected into real-world projects (solving only 13.46% to 41.34% bugs at the Analyze layer). These results underscore the urgent need for evolving benchmarks and provide a foundation for more trustworthy evaluation of LLM-powered software engineering solutions.
BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions
Yinghang Ma
Jiho Shin
Leuson Da Silva
Zhen Ming (Jack) Jiang
Song Wang
Shin Hwei Tan
Recent advances in large language models (LLMs) have accelerated the development of AI-driven automated program repair (APR) solutions. Howe… (voir plus)ver, these solutions are typically evaluated using static benchmarks such as Defects4J and SWE-bench, which suffer from two key limitations: (1) the risk of data contamination, potentially inflating evaluation results due to overlap with LLM training data, and (2) limited ability to assess the APR capabilities in dynamic and diverse contexts. In this paper, we introduced BloomAPR, a novel dynamic evaluation framework grounded in Bloom's Taxonomy. Our framework offers a structured approach to assess the cognitive capabilities of LLM-powered APR solutions across progressively complex reasoning levels. Using Defects4J as a case study, we evaluated two state-of-the-art LLM-powered APR solutions, ChatRepair and CigaR, under three different LLMs: GPT-3.5-Turbo, Llama-3.1, and StarCoder-2. Our findings show that while these solutions exhibit basic reasoning skills and effectively memorize bug-fixing patterns (fixing up to 81.57% of bugs at the Remember layer), their performance increases with synthetically generated bugs (up to 60.66% increase at the Understand layer). However, they perform worse on minor syntactic changes (fixing up to 43.32% at the Apply layer), and they struggle to repair similar bugs when injected into real-world projects (solving only 13.46% to 41.34% bugs at the Analyze layer). These results underscore the urgent need for evolving benchmarks and provide a foundation for more trustworthy evaluation of LLM-powered software engineering solutions.
Correction to: Assessing the adoption of security policies by developers in terraform across different cloud providers
Alexandre Verdet
Mohammad Hamdaqa
Leuson Da Silva
Correction to: Assessing the adoption of security policies by developers in terraform across different cloud providers
Alexandre Verdet
Mohammad Hamdaqa
Leuson Da Silva
Correction to: Assessing the adoption of security policies by developers in terraform across different cloud providers
Alexandre Verdet
Mohammad Hamdaqa
Leuson Da Silva
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks
Moses Openja
Paolo Arcaini
Fuyuki Ishikawa