Portrait de Foutse Khomh

Foutse Khomh

Membre académique associé
Chaire en IA Canada-CIFAR
Professeur, Polytechnique Montréal, Département de génie informatique et génie logiciel
Sujets de recherche
Apprentissage de la programmation
Apprentissage par renforcement
Apprentissage profond
Exploration des données
Modèles génératifs
Systèmes distribués
Traitement du langage naturel

Biographie

Foutse Khomh est professeur titulaire de génie logiciel à Polytechnique Montréal, titulaire d'une chaire en IA Canada-CIFAR dans le domaine des systèmes logiciels d'apprentissage automatique fiables, et titulaire d'une chaire de recherche FRQ-IVADO sur l'assurance qualité des logiciels pour les applications d'apprentissage automatique.

Il a obtenu un doctorat en génie logiciel de l'Université de Montréal en 2011, avec une bourse d'excellence. Il a également reçu le prix CS-Can/Info-Can du meilleur jeune chercheur en informatique en 2019. Ses recherches portent sur la maintenance et l'évolution des logiciels, l'ingénierie des systèmes d'apprentissage automatique, l'ingénierie en nuage et l’IA/apprentissage automatique fiable et digne de confiance.

Ses travaux ont été récompensés par quatre prix de l’article le plus important Most Influential Paper en dix ans et six prix du meilleur article ou de l’article exceptionnel (Best/Distinguished Paper). Il a également siégé au comité directeur de plusieurs conférences et rencontres : SANER (comme président), MSR, PROMISE, ICPC (comme président) et ICSME (en tant que vice-président). Il a initié et coorganisé le symposium Software Engineering for Machine Learning Applications (SEMLA) et la série d'ateliers Release Engineering (RELENG).

Il est cofondateur du projet CRSNG CREATE SE4AI : A Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems et l'un des chercheurs principaux du projet Dependable Explainable Learning (DEEL). Il est également cofondateur de l'initiative québécoise sur l'IA digne de confiance (Confiance IA Québec). Il fait partie du comité de rédaction de plusieurs revues internationales de génie logiciel (dont IEEE Software, EMSE, JSEP) et est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE).

Étudiants actuels

Collaborateur·rice alumni - Polytechnique
Doctorat - Polytechnique
Doctorat - Polytechnique
Postdoctorat - Polytechnique
Co-superviseur⋅e :
Maîtrise recherche - Polytechnique
Maîtrise recherche - Polytechnique
Maîtrise recherche - Polytechnique

Publications

Harnessing Predictive Modeling and Software Analytics in the Age of LLM-Powered Software Development (Invited Talk)
Bug characterization in machine learning-based systems
Mohammad Mehdi Morovati
Amin Nikanjam
Florian Tambon
Z. Jiang
A Machine Learning Based Approach to Detect Machine Learning Design Patterns
Weitao Pan
Hironori Washizaki
Nobukazu Yoshioka
Yoshiaki Fukazawa
Yann‐Gaël Guéhéneuc
As machine learning expands to various domains, the demand for reusable solutions to similar problems increases. Machine learning design pat… (voir plus)terns are reusable solutions to design problems of machine learning applications. They can significantly enhance programmers' productivity in programming that requires machine learning algorithms. Given the critical role of machine learning design patterns, the automated detection of them becomes equally vital. However, identifying design patterns can be time-consuming and error-prone. We propose an approach to detect their occurrences in Python files. Our approach uses an Abstract Syntax Tree (AST) of Python files to build a corpus of data and train a refined Text-CNN model to automatically identify machine learning design patterns. We empirically validate our approach by conducting an exploratory study to detect four common machine learning design patterns: Embedding, Multilabel, Feature Cross, and Hashed Feature. We manually label 450 Python code files containing these design patterns from repositories of projects in GitHub. Our approach achieves accuracy values ranging from 80 % to 92% for each of the four patterns.
A large-scale exploratory study of android sports apps in the google play store
Bhagya Chembakottu
Heng Li
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow
Florian Tambon
Amin Nikanjam
Le An
Giuliano Antoniol
An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
Aaditya Bhatia
Bram Adams
Ahmed E. Hassan
The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algori… (voir plus)thms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.
Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study
Vahid Majdinasab
Michael Joshua Bishop
Shawn Rasheed
Arghavan Moradi Dakhel
Amjed Tahir
Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study
Vahid Majdinasab
Michael Joshua Bishop
Shawn Rasheed
Arghavan Moradi Dakhel
Amjed Tahir
Detection and evaluation of bias-inducing features in machine learning
Studying the characteristics of AIOps projects on GitHub
Roozbeh Aghili
Heng Li
Summary of the Fourth International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest 2023)
Matteo Biagiola
Nicolás Cardozo
Donghwan Shin
Andrea Stocco
Vincenzo Riccio
On the effectiveness of log representation for log-based anomaly detection
Xingfang Wu
Heng Li