ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code

Gupta, Kshitij; Chaterjee, Bapi (Advisor)

dc.contributor.author	Gupta, Kshitij
dc.contributor.author	Chaterjee, Bapi (Advisor)
dc.date.accessioned	2026-05-26T05:12:23Z
dc.date.available	2026-05-26T05:12:23Z
dc.date.issued	2025-07-17
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1980
dc.description.abstract	This thesis presents ConcurBench, a novel benchmark framework designed to evaluate the capa- bilities of Large Language Models (LLMs) in generating concurrent code. Concurrent program- ming remains one of the most challenging domains in software development, requiring careful attention to thread safety, synchronization, and race conditions. As LLMs increasingly become part of software development workflows, understanding their ability to generate correct concur- rent code is crucial. ConcurBench addresses this need by providing a comprehensive evaluation framework that ex- tracts high-quality concurrent functions from popular open-source repositories, annotates them with natural language requirements, and tests LLMs’ ability to regenerate these functions with varying levels of context. The framework implements a multi-level context evaluation approach, testing LLMs with no context (function signature only), local context (surrounding function- s/imports), and full context (entire file context). The thesis details the design and implementation of ConcurBench’s pipeline architecture, in- cluding repository discovery and collection, function extraction, test discovery, LLM annotation, function generation, and evaluation. Key innovations include a dynamic test harness generation system that can compile and test LLM-generated code against original implementations without modification, and an orchestration wrapper script that enables scalable, automated evaluation across multiple functions and LLMs. Experimental results demonstrate that context significantly impacts LLMs’ ability to generate correct concurrent code, with full context providing substantial improvements in functional correctness. The benchmark provides valuable insights into the strengths and limitations of current LLMs in handling concurrent programming tasks and establishes a methodology for evaluating future advancements in this domain.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Concurrent Programming	en_US
dc.subject	Large Language Models	en_US
dc.subject	Benchmark Framework	en_US
dc.subject	Thread Safety	en_US
dc.subject	Software Testing	en_US
dc.title	ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code	en_US
dc.type	Other	en_US