Foutse Khomh

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Gabriel Laberge

Doctorat - Polytechnique

Github

forough majidi

Doctorat - Polytechnique

Site web

Mo Malekpour

Maîtrise recherche - Polytechnique

Site web

Github

Elnathan Tiokou Tiokou Fangang

Mohamed Amine Merzouk

Postdoctorat - Polytechnique

Co-superviseur⋅e :

Maîtrise recherche - Polytechnique

Maîtrise recherche - Polytechnique

Ben Braiek Yasmine

Maîtrise recherche - Polytechnique

Publications

The role of Large Language Models in IoT security: A systematic review of advances, challenges, and opportunities

Saeid Jamshidi

Negar Shahabi

Amin Nikanjam

Kawser Wazed Nafi

Carol Fung

2025-11-01

Internet of Things (publié)

Refactoring with LLMs: Bridging Human Expertise and Machine Understanding

Yonnel Chen Kuang Piao

Jean Carlors Paul

Leuson Da Silva

Arghavan Moradi Dakhel

Mohammad Hamdaqa

2025-10-04

ArXiv (prépublication)

arxiv.org

Refactoring with LLMs: Bridging Human Expertise and Machine Understanding

Yonnel Chen Kuang Piao

Jean Carlors Paul

Leuson Da Silva

Arghavan Moradi Dakhel

Mohammad Hamdaqa

Code refactoring is a fundamental software engineering practice aimed at improving code quality and maintainability. Despite its importance,… (voir plus) developers often neglect refactoring due to the significant time, effort, and resources it requires, as well as the lack of immediate functional rewards. Although several automated refactoring tools have been proposed, they remain limited in supporting a broad spectrum of refactoring types. In this study, we explore whether instruction strategies inspired by human best-practice guidelines can enhance the ability of Large Language Models (LLMs) to perform diverse refactoring tasks automatically. Leveraging the instruction-following and code comprehension capabilities of state-of-the-art LLMs (e.g., GPT-mini and DeepSeek-V3), we draw on Martin Fowler's refactoring guidelines to design multiple instruction strategies that encode motivations, procedural steps, and transformation objectives for 61 well-known refactoring types. We evaluate these strategies on benchmark examples and real-world code snippets from GitHub projects. Our results show that instruction designs grounded in Fowler's guidelines enable LLMs to successfully perform all benchmark refactoring types and preserve program semantics in real-world settings, an essential criterion for effective refactoring. Moreover, while descriptive instructions are more interpretable to humans, our results show that rule-based instructions often lead to better performance in specific scenarios. Interestingly, allowing models to focus on the overall goal of refactoring, rather than prescribing a fixed transformation type, can yield even greater improvements in code quality.

2025-10-04

ArXiv (prépublication)

arxiv.org

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

2025-09-30

Empirical Software Engineering (publié)

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

2025-09-30

Empirical Software Engineering (publié)

DeepCodeProbe: Evaluating Code Representation Quality in Models Trained on Code

Vahid Majdinasab

Amin Nikanjam

2025-09-30

Empirical Software Engineering (publié)

BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions

Yinghang Ma

Jiho Shin

Leuson Da Silva

Zhen Ming (Jack) Jiang

Song Wang

Shin Hwei Tan

Recent advances in large language models (LLMs) have accelerated the development of AI-driven automated program repair (APR) solutions. Howe… (voir plus)ver, these solutions are typically evaluated using static benchmarks such as Defects4J and SWE-bench, which suffer from two key limitations: (1) the risk of data contamination, potentially inflating evaluation results due to overlap with LLM training data, and (2) limited ability to assess the APR capabilities in dynamic and diverse contexts. In this paper, we introduced BloomAPR, a novel dynamic evaluation framework grounded in Bloom's Taxonomy. Our framework offers a structured approach to assess the cognitive capabilities of LLM-powered APR solutions across progressively complex reasoning levels. Using Defects4J as a case study, we evaluated two state-of-the-art LLM-powered APR solutions, ChatRepair and CigaR, under three different LLMs: GPT-3.5-Turbo, Llama-3.1, and StarCoder-2. Our findings show that while these solutions exhibit basic reasoning skills and effectively memorize bug-fixing patterns (fixing up to 81.57% of bugs at the Remember layer), their performance increases with synthetically generated bugs (up to 60.66% increase at the Understand layer). However, they perform worse on minor syntactic changes (fixing up to 43.32% at the Apply layer), and they struggle to repair similar bugs when injected into real-world projects (solving only 13.46% to 41.34% bugs at the Analyze layer). These results underscore the urgent need for evolving benchmarks and provide a foundation for more trustworthy evaluation of LLM-powered software engineering solutions.

2025-09-29

ArXiv (prépublication)

arxiv.org

BloomAPR: A Bloom's Taxonomy-based Framework for Assessing the Capabilities of LLM-Powered APR Solutions

Yinghang Ma

Jiho Shin

Leuson Da Silva

Zhen Ming (Jack) Jiang

Song Wang