ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code

Gupta, Kshitij; Chaterjee, Bapi (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1980

Title:	ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code
Authors:	Gupta, Kshitij Chaterjee, Bapi (Advisor)
Keywords:	Concurrent Programming Large Language Models Benchmark Framework Thread Safety Software Testing
Issue Date:	17-Jul-2025
Publisher:	IIIT-Delhi
Abstract:	This thesis presents ConcurBench, a novel benchmark framework designed to evaluate the capa- bilities of Large Language Models (LLMs) in generating concurrent code. Concurrent program- ming remains one of the most challenging domains in software development, requiring careful attention to thread safety, synchronization, and race conditions. As LLMs increasingly become part of software development workflows, understanding their ability to generate correct concur- rent code is crucial. ConcurBench addresses this need by providing a comprehensive evaluation framework that ex- tracts high-quality concurrent functions from popular open-source repositories, annotates them with natural language requirements, and tests LLMs’ ability to regenerate these functions with varying levels of context. The framework implements a multi-level context evaluation approach, testing LLMs with no context (function signature only), local context (surrounding function- s/imports), and full context (entire file context). The thesis details the design and implementation of ConcurBench’s pipeline architecture, in- cluding repository discovery and collection, function extraction, test discovery, LLM annotation, function generation, and evaluation. Key innovations include a dynamic test harness generation system that can compile and test LLM-generated code against original implementations without modification, and an orchestration wrapper script that enables scalable, automated evaluation across multiple functions and LLMs. Experimental results demonstrate that context significantly impacts LLMs’ ability to generate correct concurrent code, with full context providing substantial improvements in functional correctness. The benchmark provides valuable insights into the strengths and limitations of current LLMs in handling concurrent programming tasks and establishes a methodology for evaluating future advancements in this domain.
URI:	http://repository.iiitd.edu.in/xmlui/handle/123456789/1980
Appears in Collections:	Year-2024

Files in This Item:

File	Description	Size	Format
BTP_Report - Kshitij Gupta.pdf Restricted Access		174.82 kB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets