Publications

Assessing the Security of GitHub Copilot Generated Code - A Targeted Replication Study

Vahid Majdinasab

Michael Joshua Bishop

Shawn Rasheed

Arghavan Moradi Dakhel

Amjed Tahir

Foutse Khomh

2024-03-12

2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (published)

Deep Learning Model Reuse in the HuggingFace Community: Challenges, Benefit and Trends

Mina Taraghi

Gianolli Dorcelus

Armstrong Foundjem

Florian Tambon

Foutse Khomh

The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise, sparking interest in model hubs, and dedicated platforms for hosting P… (see more)TMs. Despite this trend, a comprehensive exploration of the challenges that users encounter and how the community leverages PTMs remains lacking. To address this gap, we conducted an extensive mixed-methods empirical study by focusing on discussion forums and the model hub of HuggingFace, the largest public model hub. Based on our qualitative analysis, we present a taxonomy of the challenges and benefits associated with PTM reuse within this community. We then conduct a quantitative study to track model-type trends and model documentation evolution over time. Our findings highlight prevalent challenges such as limited guidance for beginner users, struggles with model output comprehensibility in training or inference, and a lack of model understanding. We also identified interesting trends among models where some models maintain high upload rates despite a decline in topics related to them. Additionally, we found that despite the introduction of model documentation tools, its quantity has not increased over time, leading to difficulties in model comprehension and selection among users. Our study sheds light on new challenges in reusing PTMs that were not reported before and we provide recommendations for various stakeholders involved in PTM reuse.

2024-03-12

2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (published)

Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection

Xingfang Wu

Heng Li

Nobukazu Yoshioka

Hironori Washizaki

Foutse Khomh

2024-03-12

2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (published)

Reproducible Spinal Cord Quantitative MRI Analysis with the Spinal Cord Toolbox

Jan Valošek

Julien Cohen-Adad

The spinal cord plays a pivotal role in the central nervous system, providing communication between the brain and the body and containing cr… (see more)itical motor and sensory networks. Recent advancements in spinal cord MRI data acquisition and image analysis have shown a potential to improve the diagnostics, prognosis, and management of a variety of pathological conditions. In this review, we first discuss the significance of standardized spinal cord MRI acquisition protocol in multi-center and multi-manufacturer studies. Then, we cover open-access spinal cord MRI datasets, which are important for reproducible science and validation of new methods. Finally, we elaborate on the recent advances in spinal cord MRI data analysis techniques implemented in the open-source software package Spinal Cord Toolbox (SCT).

2024-03-12

Magnetic Resonance in Medical Sciences (published)

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct

Peter Henderson

Jieru Hu

Mona Diab

Joelle Pineau

2024-03-12

Proceedings of the Symposium on Computer Science and Law (published)

Simulating Weighted Automata over Sequences and Trees with Transformers

Michael Rizvi

Maude Lizaire

Clara Lacroce

Guillaume Rabusseau

2024-03-12

ArXiv (preprint)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

David Vazquez

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

David Vazquez

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

David Vazquez

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

David Vazquez

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

David Vazquez

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)