IIIT-Delhi Institutional Repository

ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code

Show simple item record

dc.contributor.author Gupta, Kshitij
dc.contributor.author Chaterjee, Bapi (Advisor)
dc.date.accessioned 2026-05-26T05:12:23Z
dc.date.available 2026-05-26T05:12:23Z
dc.date.issued 2025-07-17
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1980
dc.description.abstract This thesis presents ConcurBench, a novel benchmark framework designed to evaluate the capa- bilities of Large Language Models (LLMs) in generating concurrent code. Concurrent program- ming remains one of the most challenging domains in software development, requiring careful attention to thread safety, synchronization, and race conditions. As LLMs increasingly become part of software development workflows, understanding their ability to generate correct concur- rent code is crucial. ConcurBench addresses this need by providing a comprehensive evaluation framework that ex- tracts high-quality concurrent functions from popular open-source repositories, annotates them with natural language requirements, and tests LLMs’ ability to regenerate these functions with varying levels of context. The framework implements a multi-level context evaluation approach, testing LLMs with no context (function signature only), local context (surrounding function- s/imports), and full context (entire file context). The thesis details the design and implementation of ConcurBench’s pipeline architecture, in- cluding repository discovery and collection, function extraction, test discovery, LLM annotation, function generation, and evaluation. Key innovations include a dynamic test harness generation system that can compile and test LLM-generated code against original implementations without modification, and an orchestration wrapper script that enables scalable, automated evaluation across multiple functions and LLMs. Experimental results demonstrate that context significantly impacts LLMs’ ability to generate correct concurrent code, with full context providing substantial improvements in functional correctness. The benchmark provides valuable insights into the strengths and limitations of current LLMs in handling concurrent programming tasks and establishes a methodology for evaluating future advancements in this domain. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Concurrent Programming en_US
dc.subject Large Language Models en_US
dc.subject Benchmark Framework en_US
dc.subject Thread Safety en_US
dc.subject Software Testing en_US
dc.title ConcurBench: a benchmark framework for evaluating LLM-generated concurrent code en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account