IIIT-Delhi Institutional Repository

Unit test generation using LLMs: a comparative performance analysis of autogeneration tools

Show simple item record

dc.contributor.author Gandhi, Tarushi
dc.contributor.author Jalote, Pankaj (Advisor)
dc.date.accessioned 2024-05-20T07:17:35Z
dc.date.available 2024-05-20T07:17:35Z
dc.date.issued 2023-11-29
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1531
dc.description.abstract Generating unit tests is a crucial undertaking in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to experimentally investigate the effectiveness of LLMs, specifically exemplified by ChatGPT, for generating unit test scripts for Python programs, and how the generated test cases compare with those generated by an existing generator (Pynguin). For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. The generated test cases are evaluated based on criteria such as coverage, correctness, and readability. Through our experiments, we observed that assertions generated by Chatgpt were not always correct, had issues like compilation errors and sometimes were not comprehensive in testing the core logic. For small code units (approximately 100 lines of code (LOC)), ChatGPT-produced tests exhibit performance on par with Pynguin in terms of coverage For larger units of 100 to 300 LOC. ChatGPT’s ability to generate tests is superior to Pynguin, as the latter sometimes was not able to generate test cases. The observed minimal overlap in missed statements between ChatGPT and Pynguin suggests the potential for a synergistic combination of both tools to enhance unit test generation performance. We also study how the performance of ChatGPT can be improved by prompt engineering – asking it to improve the test cases repeatedly. We observed that, through iteratively prompting ChatGPT, improvement can be obtained in the coverage, which reaches a saturation after about 4 iterations. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Large Language Models en_US
dc.subject ChatGPT en_US
dc.subject Unit Test Generation en_US
dc.subject Coverage en_US
dc.title Unit test generation using LLMs: a comparative performance analysis of autogeneration tools en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account