Portrait of Foutse Khomh

Foutse Khomh

Associate Academic Member
Canada CIFAR AI Chair
Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Research Topics
Data Mining
Deep Learning
Distributed Systems
Generative Models
Learning to Program
Natural Language Processing
Reinforcement Learning

Biography

Foutse Khomh is a full professor of software engineering at Polytechnique Montréal, a Canada CIFAR AI Chair – Trustworthy Machine Learning Software Systems, and an FRQ-IVADO Research Chair in Software Quality Assurance for Machine Learning Applications. Khomh completed a PhD in software engineering at Université de Montréal in 2011, for which he received an Award of Excellence. He was also awarded a CS-Can/Info-Can Outstanding Young Computer Science Researcher Prize in 2019.

His research interests include software maintenance and evolution, machine learning systems engineering, cloud engineering, and dependable and trustworthy ML/AI. His work has received four Ten-year Most Influential Paper (MIP) awards, and six Best/Distinguished Paper Awards. He has served on the steering committee of numerous organizations in software engineering, including SANER (chair), MSR, PROMISE, ICPC (chair), and ICSME (vice-chair). He initiated and co-organized Polytechnique Montréal‘s Software Engineering for Machine Learning Applications (SEMLA) symposium and the RELENG (release engineering) workshop series.

Khomh co-founded the NSERC CREATE SE4AI: A Training Program on the Development, Deployment and Servicing of Artificial Intelligence-based Software Systems, and is a principal investigator for the DEpendable Explainable Learning (DEEL) project.

He also co-founded Confiance IA, a Quebec consortium focused on building trustworthy AI, and is on the editorial board of multiple international software engineering journals, including IEEE Software, EMSE and JSEP. He is a senior member of IEEE.

Current Students

Master's Research - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
Postdoctorate - Polytechnique Montréal
Co-supervisor :
Postdoctorate - Polytechnique Montréal
Master's Research - Polytechnique Montréal
PhD - Polytechnique Montréal
Master's Research - Polytechnique Montréal

Publications

Towards a Reliable French Speech Recognition Tool for an Automated Diagnosis of Learning Disabilities
Jihene Rezgui
Félix Jobin
Younes Kechout
Chritine Turgeon
Dyslexia, characterized by severe challenges in reading and spelling acquisition, presents a substantial barrier to proficient literacy, res… (see more)ulting in significantly reduced reading speed (2 to 3 times slower) and diminished text comprehension. With a prevalence ranging from 5G to 10% in the population, early intervention by speech and language pathologists (SLPs) can mitigate dyslexia's effects, but the diagnosis bottleneck impedes timely support. To address this, we propose leveraging machine learning tools to expedite the diagnosis process, focusing on automating phonetic transcription, a critical step in dyslexia assessment. We investigated the practicality of two model configurations utilizing Google's speech-to-text API with children speech in evaluation scenarios and compared their results against transcriptions crafted by experts. The first configuration focuses on Google API's speech-to-text while the second integrates Phonemizer, a text-to-phonemes tool based on a dictionary. Results analysis indicate that our Google-Phonemizer model yields reading accuracies comparable to those computed from human-made transcriptions, offering promise for clinical application. These findings underscore the potential of AI-driven solutions to enhance dyslexia diagnosis efficiency, paving the way for improved accessibility to vital SLP services.
Mining Action Rules for Defect Reduction Planning
Khouloud Oueslati
gabriel laberge
Maxime Lamothe
Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black bo… (see more)x machine learning model and"explaining"its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.
Generative AI in Software Engineering Must Be Human-Centered: The Copenhagen Manifesto
Daniel Russo
Sebastian Baltes
Niels van Berkel
Paris Avgeriou
Fabio Calefato
Beatriz Cabrero-Daniel
Gemma Catolino
Jürgen Cito
Neil Ernst
Thomas Fritz
Hideaki Hata
Reid Holmes
Maliheh Izadi
Mikkel Baun Kjærgaard
Grischa Liebel
Alberto Lluch Lafuente
Stefano Lambiase
Walid Maalej
Gail Murphy … (see 15 more)
Nils Brede Moe
Gabrielle O'Brien
Elda Paja
Mauro Pezzè
John Stouby Persson
Rafael Prikladnicki
Paul Ralph
Martin P. Robillard
Thiago Rocha Silva
Klaas-Jan Stol
Margaret-Anne Storey
Viktoria Stray
Paolo Tell
Christoph Treude
Bogdan Vasilescu
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Sim'eon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Strømberg Derczynski
Debojyoti Dutta … (see 77 more)
Ian Eisenberg
James R. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
James Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei-Xu
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Chris Lengerich
Bo Li
Zeyi Liao
Eileen Peters Long
Victor Lu
Yifan Mai
Priyanka Mary Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
Lama Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Rottger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
Patrick Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
Elizabeth A Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
Tackling the XAI Disagreement Problem with Regional Explanations
gabriel laberge
Yann Batiste Pequignot
Mario Marchand
Machine Learning Robustness: A Primer
Houssem Ben Braiek
This chapter explores the foundational concept of robustness in Machine Learning (ML) and its integral role in establishing trustworthiness … (see more)in Artificial Intelligence (AI) systems. The discussion begins with a detailed definition of robustness, portraying it as the ability of ML models to maintain stable performance across varied and unexpected environmental conditions. ML robustness is dissected through several lenses: its complementarity with generalizability; its status as a requirement for trustworthy AI; its adversarial vs non-adversarial aspects; its quantitative metrics; and its indicators such as reproducibility and explainability. The chapter delves into the factors that impede robustness, such as data bias, model complexity, and the pitfalls of underspecified ML pipelines. It surveys key techniques for robustness assessment from a broad perspective, including adversarial attacks, encompassing both digital and physical realms. It covers non-adversarial data shifts and nuances of Deep Learning (DL) software testing methodologies. The discussion progresses to explore amelioration strategies for bolstering robustness, starting with data-centric approaches like debiasing and augmentation. Further examination includes a variety of model-centric methods such as transfer learning, adversarial training, and randomized smoothing. Lastly, post-training methods are discussed, including ensemble techniques, pruning, and model repairs, emerging as cost-effective strategies to make models more resilient against the unpredictable. This chapter underscores the ongoing challenges and limitations in estimating and achieving ML robustness by existing approaches. It offers insights and directions for future research on this crucial concept, as a prerequisite for trustworthy AI systems.
Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon
Arghavan Moradi Dakhel
Amin Nikanjam
Michel C. Desmarais
Giuliano Antoniol
Assessing the Security of GitHub Copilot Generated Code - A Targeted Replication Study
Vahid Majdinasab
Michael Joshua Bishop
Shawn Rasheed
Arghavan Moradi Dakhel
Amjed Tahir
Deep Learning Model Reuse in the HuggingFace Community: Challenges, Benefit and Trends
Mina Taraghi
Gianolli Dorcelus
Armstrong Foundjem
Florian Tambon
The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise, sparking interest in model hubs, and dedicated platforms for hosting P… (see more)TMs. Despite this trend, a comprehensive exploration of the challenges that users encounter and how the community leverages PTMs remains lacking. To address this gap, we conducted an extensive mixed-methods empirical study by focusing on discussion forums and the model hub of HuggingFace, the largest public model hub. Based on our qualitative analysis, we present a taxonomy of the challenges and benefits associated with PTM reuse within this community. We then conduct a quantitative study to track model-type trends and model documentation evolution over time. Our findings highlight prevalent challenges such as limited guidance for beginner users, struggles with model output comprehensibility in training or inference, and a lack of model understanding. We also identified interesting trends among models where some models maintain high upload rates despite a decline in topics related to them. Additionally, we found that despite the introduction of model documentation tools, its quantity has not increased over time, leading to difficulties in model comprehension and selection among users. Our study sheds light on new challenges in reusing PTMs that were not reported before and we provide recommendations for various stakeholders involved in PTM reuse.
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
Xingfang Wu
Heng Li
Nobukazu Yoshioka
Hironori Washizaki
ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions
Léuson M. P. Da Silva
Jordan Samhi
Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers' queries on programming and so… (see more)ftware development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT's release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: the race was on. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to (ii) measure user engagement evolution with Stack Overflow over time; (ii) quantify the reliability of LLMs' answers and their potential to replace Stack Overflow in the long term; (iii) identify and understand why LLMs fails; and (iv) compare LLMs together. Our empirical results are unequivocal: ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs.
AITA: AI trustworthiness assessment
Bertrand Braunschweig
Stefan Buijsman
Faicel Chamroukhi
Fredrik Heintz
Juliette Mattioli
Maximilian Poretschkin