dc.description.abstract |
This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for code documentation generation. Code documentation is an essential part of the software writing process as it allows a new user to learn and build more code on top of the existing code base with relative ease. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring Starchat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source alternatives, namely LLama 2 and StarChat. Additionally, considering the time taken for generation, GPT-4 leads, followed by Llama2, Bard, with ChatGPT and Starchat exhibiting comparable generation times. This study contributes insights into the nuanced challenges of industry-level code documentation generation and establishes benchmarks for future research in this evolving domain. |
en_US |