Abstract:
Despite the growing capabilities of Large Language Models (LLMs) in various domains, their proficiency in addressing domain-specific high-school physics questions remains an unexplored area. In this study, we present a pioneering data set curated from NCERT exemplar solutions strategically designed to facilitate the use of LLMs to solve school physics questions. Originally comprising 766 questions accompanied by LaTeX representations, the dataset underwent a sophisticated augmentation process that expanded its scope to an impressive 7,983 questions. The augmentation employed innovative techniques which effectively broaden the dataset’s coverage. The dataset, prioritizing text-based questions, is formatted as JSON objects detailing instructions, inputs, and outputs. Post evaluation, we noted significant scores: METEOR at 0.282 and BERTScore F1 at 0.833, indicating a close alignment between generated and reference texts.