Publications

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software

Aaditya Bhatia

Bram Adams

Ahmed E. Hassan

The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algori… (see more)thms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.

2023-11-20

ArXiv (preprint)

doi.org

arxiv.org

Responsible AI Research Needs Impact Statements Too

Alexandra Olteanu

Michael Ekstrand

Carlos Castillo

Jina Suh

All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence … (see more)(RAI), ethical AI, or ethics in AI is no exception.

2023-11-20

ArXiv (preprint)

doi.org

arxiv.org

Substituting Data Annotation with Balanced Neighbourhoods and Collective Loss in Multi-label Text Classification

Muberra Ozmen

Joseph Cotnareanu

Mark Coates

Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains… (see more). Most existing approaches require an enormous amount of annotated data to learn a classifier and/or a set of well-defined constraints on the label space structure, such as hierarchical relations which may be complicated to provide as the number of labels increases. In this paper, we study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph, driven with a collective loss function that injects the information of expected label frequency and average multi-label cardinality of predictions. The experiments show that the proposed framework achieves effective performance under low supervision settings with almost imperceptible computational and memory overheads added to the usage of pre-trained language model outperforming its initial performance by 70% in terms of example-based F1 score.

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges

Massimo Caccia

Jonas Mueller

Taesup Kim

Laurent Charlin

Rasool Fakoor

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

openreview.net

Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

Hadi Nekoei

Xutong Zhao

Janarthanan Rajendran

Miao Liu

Sarath Chandar

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

doi.org

arxiv.org

Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study

Vahid Majdinasab

Michael Joshua Bishop

Shawn Rasheed

Arghavan Moradi Dakhel

Amjed Tahir

Foutse Khomh

2023-11-18

ArXiv (preprint)

doi.org

arxiv.org

Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study

Vahid Majdinasab

Michael Joshua Bishop

Shawn Rasheed

Arghavan Moradi Dakhel

Amjed Tahir

Foutse Khomh

2023-11-18

ArXiv (preprint)

doi.org

arxiv.org

Inferring dynamic regulatory interaction graphs from time series data with perturbations

Dhananjay Bhaskar

Daniel Sumner Magruder

Edward De Brouwer

Matheo Morales

Aarthi Venkat

Frederik Wenkel

Guy Wolf

Smita Krishnaswamy

2023-11-18

logconference.io/LOG/2023/Conference (poster)

doi.org

openreview.net

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua

Sitao Luan

Minkai Xu

Zhitao Ying

Rex Ying

Jie Fu

Stefano Ermon

Doina Precup

2023-11-18

logconference.io/LOG/2023/Conference (poster)

doi.org

openreview.net

The evidence mismatch in pediatric surgical practice

Marina Broomfield

Zena Agabani

Elena Guadagno

Dan Poenaru

Robert Baird

2023-11-18

Pediatric surgery international (Print) (published)

doi.org

Differentiable visual computing for inverse problems and machine learning

Andrew Spielberg

Fangcheng Zhong

Konstantinos Rematas

Krishna Murthy

Cengiz Oztireli

Tzu-Mao Li

Derek Nowrouzezahrai

2023-11-17

Nature Machine Intelligence (published)

doi.org

arxiv.org

From physics to sentience: Deciphering the semantics of the free-energy principle and evaluating its claims: Comment on "Path integrals, particular kinds, and strange things" by Karl Friston et al.

Zahra Sheikhbahaee

Adam Safron

Casper Hesp

Guillaume Dumas

2023-11-17

ArXiv (preprint)

doi.org

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications