Foutse Khomh

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Houssem Ben Braiek

Postdoctorate - Polytechnique Montréal

PhD - Polytechnique Montréal

Github

forough majidi

PhD - Polytechnique Montréal

Website

Khouloud Oueslati

Master's Research - Polytechnique Montréal

Arian Qazvini

Master's Research - Polytechnique Montréal

Website

Github

Elnathan Tiokou Tiokou Fangang

Master's Research - Polytechnique Montréal

Ben Braiek Yasmine

Master's Research - Polytechnique Montréal

Publications

Trimming the Risk: Towards Reliable Continuous Training for Deep Learning Inspection Systems

Altaf Allah Abbassi

Houssem Ben Braiek

Thomas Reid

2024-09-13

ArXiv (preprint)

Reputation Gaming in Crowd Technical Knowledge Sharing

Iren Mazloomzadeh

Gias Uddin

Ashkan Sami

Stack Overflow incentive system awards users with reputation scores to ensure quality. The decentralized nature of the forum may make the in… (see more)centive system prone to manipulation. This paper offers, for the first time, a comprehensive study of the reported types of reputation manipulation scenarios that might be exercised in Stack Overflow and the prevalence of such reputation gamers by a qualitative study of 1,697 posts from meta Stack Exchange sites. We found four different types of reputation fraud scenarios, such as voting rings where communities form to upvote each other repeatedly on similar posts. We developed algorithms that enable platform managers to automatically identify these suspicious reputation gaming scenarios for review. The first algorithm identifies isolated/semi-isolated communities where probable reputation frauds may occur mostly by collaborating with each other. The second algorithm looks for sudden unusual big jumps in the reputation scores of users. We evaluated the performance of our algorithms by examining the reputation history dashboard of Stack Overflow users from the Stack Overflow website. We observed that around 60-80% of users flagged as suspicious by our algorithms experienced reductions in their reputation scores by Stack Overflow.

2024-09-04

ACM Transactions on Software Engineering and Methodology (published)

Assessing Programming Task Difficulty for Efficient Evaluation of Large Language Models

Florian Tambon

Amin Nikanjam

Giuliano Antoniol

2024-07-30

ArXiv (preprint)

TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models

Florian Tambon

Amin Nikanjam

Cyrine Zid

Giuliano Antoniol

2024-07-30

ArXiv (preprint)

TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models

Florian Tambon

Amin Nikanjam

Cyrine Zid

Giuliano Antoniol

2024-07-30

ArXiv (preprint)

Mining Action Rules for Defect Reduction Planning

Khouloud Oueslati

Gabriel Laberge

Maxime Lamothe

Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black bo… (see more)x machine learning model and"explaining"its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.

2024-07-12

Proceedings of the ACM on Software Engineering (published)

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

Vahid Majdinasab

Amin Nikanjam

2024-07-11

ArXiv (preprint)

Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs

Sylvain Kouemo Ngassom

Arghavan Moradi Dakhel

Florian Tambon

2024-07-10

Proceedings of the 1st ACM International Conference on AI-Powered Software (published)

Design smells in multi-language systems and bug-proneness: a survival analysis

Mouna Abidi

Md Saidur Rahman

Moses Openja

2024-07-03

Empirical Software Engineering (published)

A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter

Mohamed Salah Bouafif

Chen Zheng

Ilham Qasse

Ed Zulkoski

Mohammad Hamdaqa

The surge in the adoption of smart contracts necessitates rigorous auditing to ensure their security and reliability. Manual auditing, altho… (see more)ugh comprehensive, is time-consuming and heavily reliant on the auditor's expertise. With the rise of Large Language Models (LLMs), there is growing interest in leveraging them to assist auditors in the auditing process (co-auditing). However, the effectiveness of LLMs in smart contract co-auditing is contingent upon the design of the input prompts, especially in terms of context description and code length. This paper introduces a novel context-driven prompting technique for smart contract co-auditing. Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments based on code inter-dependencies, assessment scoping to enhance context description based on the target assessment goal, thereby limiting the search space, and reporting scoping to force a specific format for the generated response. Through empirical evaluations on publicly available vulnerable contracts, our method demonstrated a detection rate of 96\% for vulnerable functions, outperforming the native prompting approach, which detected only 53\%. To assess the reliability of our prompting approach, manual analysis of the results was conducted by expert auditors from our partner, Quantstamp, a world-leading smart contract auditing company. The experts' analysis indicates that, in unlabeled datasets, our proposed approach enhances the proficiency of the GPT-4 code interpreter in detecting vulnerabilities.

2024-06-26

ArXiv (preprint)

PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4

Seif Abukhalaf

Mohammad Hamdaqa

The rapid progress of AI-powered programming assistants, such as GitHub Copilot, has facilitated the development of software applications. T… (see more)hese assistants rely on large language models (LLMs), which are foundation models (FMs) that support a wide range of tasks related to understanding and generating language. LLMs have demonstrated their ability to express UML model specifications using formal languages like the Object Constraint Language (OCL). However, the context size of the prompt is limited by the number of tokens an LLM can process. This limitation becomes significant as the size of UML class models increases. In this study, we introduce PathOCL, a novel path-based prompt augmentation technique designed to facilitate OCL generation. PathOCL addresses the limitations of LLMs, specifically their token processing limit and the challenges posed by large UML class models. PathOCL is based on the concept of chunking, which selectively augments the prompts with a subset of UML classes relevant to the English specification. Our findings demonstrate that PathOCL, compared to augmenting the complete UML class model (UML-Augmentation), generates a higher number of valid and correct OCL constraints using the GPT-4 model. Moreover, the average prompt size crafted using PathOCL significantly decreases when scaling the size of the UML class models.

2024-06-12

Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (published)

Characterizing and Classifying Developer Forum Posts with their Intentions

Xingfang Wu

Eric Laufer

Heng Li

Santhosh Srinivasan

Jayden Luo

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses diffi… (see more)culties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.

2024-06-05

Empirical Software Engineering (published)