We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
The emergence of open-source ML libraries such as TensorFlow and Google Auto ML has enabled developers to harness state-of-the-art ML algori… (see more)thms with minimal overhead. However, during this accelerated ML development process, said developers may often make sub-optimal design and implementation decisions, leading to the introduction of technical debt that, if not addressed promptly, can have a significant impact on the quality of the ML-based software. Developers frequently acknowledge these sub-optimal design and development choices through code comments during software development. These comments, which often highlight areas requiring additional work or refinement in the future, are known as self-admitted technical debt (SATD). This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects. We detected SATD in source code comments throughout the different project snapshots, conducted a manual analysis of the identified SATD sample to comprehend the nature of technical debt in the ML code, and performed a survival analysis of the SATD to understand the evolution of such debts. We observed: i) Machine learning projects have a median percentage of SATD that is twice the median percentage of SATD in non-machine learning projects. ii) ML pipeline components for data preprocessing and model generation logic are more susceptible to debt than model validation and deployment components. iii) SATDs appear in ML projects earlier in the development process compared to non-ML projects. iv) Long-lasting SATDs are typically introduced during extensive code changes that span multiple files exhibiting low complexity.
All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence … (see more)(RAI), ethical AI, or ethics in AI is no exception.
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains… (see more). Most existing approaches require an enormous amount of annotated data to learn a classifier and/or a set of well-defined constraints on the label space structure, such as hierarchical relations which may be complicated to provide as the number of labels increases. In this paper, we study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph, driven with a collective loss function that injects the information of expected label frequency and average multi-label cardinality of predictions. The experiments show that the proposed framework achieves effective performance under low supervision settings with almost imperceptible computational and memory overheads added to the usage of pre-trained language model outperforming its initial performance by 70% in terms of example-based F1 score.
2023-11-20
Proceedings of The 2nd Conference on Lifelong Learning Agents (published)
From physics to sentience: Deciphering the semantics of the free-energy principle and evaluating its claims: Comment on "Path integrals, particular kinds, and strange things" by Karl Friston et al.