Publications

Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs

Sylvain Kouemo Ngassom

Arghavan Moradi Dakhel

Florian Tambon

Foutse Khomh

LLM-based assistants, such as GitHub Copilot and ChatGPT, have the potential to generate code that fulfills a programming task described in … (voir plus)a natural language description, referred to as a prompt. The widespread accessibility of these assistants enables users with diverse backgrounds to generate code and integrate it into software projects. However, studies show that code generated by LLMs is prone to bugs and may miss various corner cases in task specifications. Presenting such buggy code to users can impact their reliability and trust in LLM-based assistants. Moreover, significant efforts are required by the user to detect and repair any bug present in the code, especially if no test cases are available. In this study, we propose a self-refinement method aimed at improving the reliability of code generated by LLMs by minimizing the number of bugs before execution, without human intervention, and in the absence of test cases. Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code. These VQs target various nodes within the Abstract Syntax Tree (AST) of the initial code, which have the potential to trigger specific types of bug patterns commonly found in LLM-generated code. Finally, our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code. Our evaluation, based on programming tasks in the CoderEval dataset, demonstrates that our proposed method outperforms state-of-the-art methods by decreasing the number of targeted errors in the code between 21% to 62% and improving the number of executable code instances to 13%.

2024-05-22

ArXiv (prépublication)

Mining Action Rules for Defect Reduction Planning

Khouloud Oueslati

Gabriel Laberge

Maxime Lamothe

Foutse Khomh

Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black bo… (voir plus)x machine learning model and “explaining” its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT’s explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.

2024-05-22

ArXiv (prépublication)

Mining Action Rules for Defect Reduction Planning

Khouloud Oueslati

Gabriel Laberge

Maxime Lamothe

Foutse Khomh

Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black bo… (voir plus)x machine learning model and “explaining” its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT’s explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.

2024-05-22

ArXiv (prépublication)

Static graph approximations of dynamic contact networks for epidemic forecasting

Razieh Shirzadkhani

Shenyang Huang

Abby Leung

Reihaneh Rabbany

2024-05-22

Scientific Reports (publié)

Investigation of dosimetric characteristics of radiochromic film in response to alpha particles emitted from Americium-241.

Victor D. Diaz‐Martinez

Mélodie Cyr

S. Devic

Nada Tomic

David F. Lewis

Shirin A. Enger

BACKGROUND In radiotherapy, it is essential to deliver prescribed doses to tumors while minimizing damage to surrounding healthy tissue. Acc… (voir plus)urate measurements of absorbed dose are required for this purpose. Gafchromic® external beam therapy (EBT) radiochromic films have been widely used in radiotherapy. While the dosimetric characteristics of the EBT3 model film have been extensively studied for photon and charged particle beams (protons, electrons, and carbon ions), little research has been done on α

2024-05-20

Medical Physics (Lancaster) (publié)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (voir plus)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

2024-05-20

ArXiv (prépublication)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (voir plus)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

2024-05-20

ArXiv (prépublication)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (voir plus)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

2024-05-20

ArXiv (prépublication)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (voir plus)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

2024-05-20

ArXiv (prépublication)

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Aniket Rajiv Didolkar

Nan Rosemary Ke

Siyuan Guo

Michal Valko

Timothy P Lillicrap

Danilo Jimenez Rezende

Michael Curtis Mozer

Sanjeev Arora

Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly posse… (voir plus)ss some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

2024-05-20

ArXiv (prépublication)

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Milos Nikolic

Ghouthi Boukli hacene

Ciaran Bannon

Alberto Delmas Lascorz

Matthieu Courbariaux

Omar Mohamed Awad

Isak Edo Vivancos

Vincent Gripon

Andreas Moshovos

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time … (voir plus)and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Namely, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13, 3.76 and 4.36 bits on AlexNet, ResNet18 and MobileNet V2 respectively, remaining within 2.0%, 0.5% and 0.5% of the base TOP-1 accuracy.

2024-05-19

2024 IEEE International Symposium on Circuits and Systems (ISCAS) (publié)