Visual question answering

Garg, Prakrit; Shah, Rajiv Ratn (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1464

Full metadata record

DC Field	Value	Language
dc.contributor.author	Garg, Prakrit	-
dc.contributor.author	Shah, Rajiv Ratn (Advisor)	-
dc.date.accessioned	2024-05-15T11:00:57Z	-
dc.date.available	2024-05-15T11:00:57Z	-
dc.date.issued	2023-11-29	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1464	-
dc.description.abstract	Visual Question Answering(VQA) has applications in various rising areas ranging from Medical Imagery to Video Surveillance and Assistance[1]. VQA problem is a central component of the Artificial General Intelligence problem, i.e., creating a machine that can understand or learn any intellectual task that a human being can[2]. Geman et al. (2015) have also suggested using the VQA problem as a Visual Turing Test[2]. Our research work dives into the application of VQA in public tranport. VQA can provide an interactive and accessible way for individuals with visual impairments or other disabilities to receive information about public transport. By using image recognition and natural language processing, VQA systems can describe surroundings, read signage, and provide real-time updates, making public transportation more accessible. The research work started by capturing images of public transport, mainly DTC buses. Utilizing advanced image captioning models such as BLIP-2 ViT-G FlanT5 XL robust OCR models like EasyOCR, PaddleOCR, and MMOCR for capturing information and precise text extraction from images. The primary objective is to create an integrative system that combines the strengths of both image captioning and OCR technologies to enhance the understanding and accessibility of public transport information.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Visual Question Answering	en_US
dc.subject	Public Transport	en_US
dc.subject	DTC Buses	en_US
dc.subject	Image Captioning	en_US
dc.subject	OCR models	en_US
dc.subject	BLIP-2	en_US
dc.subject	Optical Character Recognition	en_US
dc.subject	EasyOCR	en_US
dc.subject	MMOCR	en_US
dc.title	Visual question answering	en_US
dc.type	Other	en_US
Appears in Collections:	Year-2023

Files in This Item:

File	Description	Size	Format
BTP_Report_Research__Version_1937_ - Prakrit Garg.pdf Restricted Access		2.59 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets