IIIT-Delhi Institutional Repository

VertexVQA4k: enhancing large language model proficiency: advanced datasets for solving complex geometric problems

Show simple item record

dc.contributor.author Popat, Harsh Parimal
dc.contributor.author Mital, Harshil
dc.contributor.author Shah, Rajiv Ratn (Advisor)
dc.date.accessioned 2026-04-15T14:21:40Z
dc.date.available 2026-04-15T14:21:40Z
dc.date.issued 2024-11-27
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1890
dc.description.abstract Over the course of the semester we worked on VertexVQA4k, a comprehensive multimodal dataset designed for secondary-level geometry education, drawing from Indian curricula. The dataset, containing approximately 4,000 geometric image-caption and question-answer pairs, em- phasizes Numerical Answer Questions and Theorem Proving Questions, thereby broadening the scope and educational significance of multimodal numerical reasoning in Large Language Models (LLMs). VertexVQA4k distinguishes itself from existing geometry datasets by providing dual solution approaches for each problem, aiming to enhance problem-solving skills and model com- prehension. The paper details the meticulous dataset extraction and augmentation processes, including diagram description generation and solution regeneration, to improve the capabilities of multimodal LLMs in geometric problem-solving. The paper also explores hallucination in Large Vision Language Models (LVLMs) and proposes mitigation strategies. Furthermore, it delves into image captioning, stressing the importance of generating meaningful visual repre- sentations and coherent captions. The study concludes with an evaluation of the dataset and models, underscoring the efficacy of VertexVQA4k in advancing multimodal learning and rea- soning in the LLMs. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Maths Reasoning en_US
dc.subject Multimodal Dataset en_US
dc.subject Large Vision Language models en_US
dc.title VertexVQA4k: enhancing large language model proficiency: advanced datasets for solving complex geometric problems en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account