Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1464
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGarg, Prakrit-
dc.contributor.authorShah, Rajiv Ratn (Advisor)-
dc.date.accessioned2024-05-15T11:00:57Z-
dc.date.available2024-05-15T11:00:57Z-
dc.date.issued2023-11-29-
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/1464-
dc.description.abstractVisual Question Answering(VQA) has applications in various rising areas ranging from Medical Imagery to Video Surveillance and Assistance[1]. VQA problem is a central component of the Artificial General Intelligence problem, i.e., creating a machine that can understand or learn any intellectual task that a human being can[2]. Geman et al. (2015) have also suggested using the VQA problem as a Visual Turing Test[2]. Our research work dives into the application of VQA in public tranport. VQA can provide an interactive and accessible way for individuals with visual impairments or other disabilities to receive information about public transport. By using image recognition and natural language processing, VQA systems can describe surroundings, read signage, and provide real-time updates, making public transportation more accessible. The research work started by capturing images of public transport, mainly DTC buses. Utilizing advanced image captioning models such as BLIP-2 ViT-G FlanT5 XL robust OCR models like EasyOCR, PaddleOCR, and MMOCR for capturing information and precise text extraction from images. The primary objective is to create an integrative system that combines the strengths of both image captioning and OCR technologies to enhance the understanding and accessibility of public transport information.en_US
dc.language.isoen_USen_US
dc.publisherIIIT-Delhien_US
dc.subjectVisual Question Answeringen_US
dc.subjectPublic Transporten_US
dc.subjectDTC Busesen_US
dc.subjectImage Captioningen_US
dc.subjectOCR modelsen_US
dc.subjectBLIP-2en_US
dc.subjectOptical Character Recognitionen_US
dc.subjectEasyOCRen_US
dc.subjectMMOCRen_US
dc.titleVisual question answeringen_US
dc.typeOtheren_US
Appears in Collections:Year-2023

Files in This Item:
File Description SizeFormat 
BTP_Report_Research__Version_1937_ - Prakrit Garg.pdf
  Restricted Access
2.59 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.