Foutse Khomh

Samira Keivanpour

2025-03-01

Journal of Network and Computer Applications (published)

Unveiling Inefficiencies in LLM-Generated Code: Toward a Comprehensive Taxonomy

Altaf Allah Abbassi

Leuson Da Silva

Amin Nikanjam

2025-03-01

arXiv (published)

Assessing the adoption of security policies by developers in terraform across different cloud providers

Alexandre Verdet

Mohammad Hamdaqa

Leuson Da Silva

2025-02-27

Empirical Software Engineering (published)

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Shaona Ghosh

Heather Frase

Adina Williams

Sarah Luger

Paul Rottger

Fazl Barez

Sean McGregor

Kenneth Fricklas

Mala Kumar

Quentin Feuillade--Montixi

Kurt Bollacker

Felix Friedrich

Ryan Tsang

Bertie Vidgen

Alicia Parrish

Chris Knotz

Eleonora Presani

Jonathan Bennion

Marisa Ferrara Boston

Mike Kuniavsky … (see 81 more)

Wiebke Hutiri

James Ezick

Malek Ben Salem

Rajat Sahay

Sujata Goswami

Usman Gohar

Ben Huang

Supheakmungkol Sarin

Elie Alhajjar

Canyu Chen

Roman Eng

K. Manjusha

Virendra Mehta

Eileen Peters Long

Murali Krishna Emani

Natan Vidra

Benjamin Rukundo

Abolfazl Shahbazi

Kongtao Chen

Rajat Ghosh

Vithursan Thangarasa

Pierre Peign'e

Abhinav Singh

Max Bartolo

Satyapriya Krishna

Mubashara Akhtar

Rafael Gold

Cody Coleman

Luis Oala

Vassil Tashev

Joseph Marvin Imperial

Amy Russ

Sasidhar Kunapuli

Nicolas Miailhe

Julien Delaunay

Bhaktipriya Radharapu

Rajat Shinde

Tuesday

Debojyoti Dutta

D. Grabb

Ananya Gangavarapu

Saurav Sahay

Agasthya Gangavarapu

Patrick Schramowski

Stephen Singam

Tom David

Xudong Han

Priyanka Mary Mammen

Tarunima Prabhakar

Venelin Kovatchev

Ahmed M. Ahmed

Kelvin Manyeki

Sandeep Madireddy

Fedor Zhdanov

Joachim Baumann

N. Vasan

Xianjun Yang

Carlos Mougn

Jibin Rajan Varghese

Hussain Chinoy

Seshakrishna Jitendar

Manil Maskey

Claire V. Hardgrove

Tianhao Li

Aakash Gupta

Emil Joswin

Yifan Mai

Shachi H. Kumar

Çigdem Patlak

Kevin Lu

Vincent Alessi

Sree Bhargavi Balija

Chenhe Gu

Robert Sullivan

James Gealy

Matt Lavrisa

James Goel

Peter Mattson

Percy Liang

Joaquin Vanschoren

2025-02-19

ArXiv (preprint)

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Shaona Ghosh

Heather Frase

Adina Williams

Sarah Luger

Paul Rottger

Fazl Barez

Sean McGregor

Kenneth Fricklas

Mala Kumar

Quentin Feuillade--Montixi

Kurt Bollacker

Felix Friedrich

Ryan Tsang

Bertie Vidgen

Alicia Parrish

Chris Knotz

Eleonora Presani

Jonathan Bennion

Marisa Ferrara Boston

Mike Kuniavsky … (see 81 more)

Wiebke Hutiri

James Ezick

Malek Ben Salem

Rajat Sahay

Sujata Goswami

Usman Gohar

Ben Huang

Supheakmungkol Sarin

Elie Alhajjar

Canyu Chen

Roman Eng

K. Manjusha

Virendra Mehta

Eileen Peters Long

Murali Krishna Emani

Natan Vidra

Benjamin Rukundo

Abolfazl Shahbazi

Kongtao Chen

Rajat Ghosh

Vithursan Thangarasa

Pierre Peign'e

Abhinav Singh

Max Bartolo

Satyapriya Krishna

Mubashara Akhtar

Rafael Gold

Cody Coleman

Luis Oala

Vassil Tashev

Joseph Marvin Imperial

Amy Russ

Sasidhar Kunapuli

Nicolas Miailhe

Julien Delaunay

Bhaktipriya Radharapu

Rajat Shinde

Tuesday

Debojyoti Dutta

Declan Grabb

Ananya Gangavarapu

Saurav Sahay

Agasthya Gangavarapu

Patrick Schramowski

Stephen Singam

Tom David

Xudong Han

Priyanka Mary Mammen

Tarunima Prabhakar

Venelin Kovatchev

Ahmed M. Ahmed

Kelvin Manyeki

Sandeep Madireddy

Fedor Zhdanov

Joachim Baumann

N. Vasan

Xianjun Yang

Carlos Mougn

Jibin Rajan Varghese

Hussain Chinoy

Seshakrishna Jitendar

Manil Maskey

Claire V. Hardgrove

Tianhao Li

Aakash Gupta

Emil Joswin

Yifan Mai

Shachi H. Kumar

Çigdem Patlak

Kevin Lu

Vincent Alessi

Sree Bhargavi Balija

Chenhe Gu

Robert Sullivan

James Gealy

Matt Lavrisa

James Goel

Peter Mattson

Percy Liang

Joaquin Vanschoren

2025-02-19

ArXiv (preprint)

Bugs in Large Language Models Generated Code: An Empirical Study

Florian Tambon

Arghavan Moradi Dakhel

Amin Nikanjam

Michel C. Desmarais

Giuliano Antoniol

2025-02-13

Empirical Software Engineering (published)

Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning

Ruchira Manke

Mohammad Wardat

Hridesh Rajan

While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (see more)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.

2025-02-11

ArXiv (preprint)