Please use this identifier to cite or link to this item:
http://repository.iiitd.edu.in/xmlui/handle/123456789/1464| Title: | Visual question answering |
| Authors: | Garg, Prakrit Shah, Rajiv Ratn (Advisor) |
| Keywords: | Visual Question Answering Public Transport DTC Buses Image Captioning OCR models BLIP-2 Optical Character Recognition EasyOCR MMOCR |
| Issue Date: | 29-Nov-2023 |
| Publisher: | IIIT-Delhi |
| Abstract: | Visual Question Answering(VQA) has applications in various rising areas ranging from Medical Imagery to Video Surveillance and Assistance[1]. VQA problem is a central component of the Artificial General Intelligence problem, i.e., creating a machine that can understand or learn any intellectual task that a human being can[2]. Geman et al. (2015) have also suggested using the VQA problem as a Visual Turing Test[2]. Our research work dives into the application of VQA in public tranport. VQA can provide an interactive and accessible way for individuals with visual impairments or other disabilities to receive information about public transport. By using image recognition and natural language processing, VQA systems can describe surroundings, read signage, and provide real-time updates, making public transportation more accessible. The research work started by capturing images of public transport, mainly DTC buses. Utilizing advanced image captioning models such as BLIP-2 ViT-G FlanT5 XL robust OCR models like EasyOCR, PaddleOCR, and MMOCR for capturing information and precise text extraction from images. The primary objective is to create an integrative system that combines the strengths of both image captioning and OCR technologies to enhance the understanding and accessibility of public transport information. |
| URI: | http://repository.iiitd.edu.in/xmlui/handle/123456789/1464 |
| Appears in Collections: | Year-2023 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| BTP_Report_Research__Version_1937_ - Prakrit Garg.pdf Restricted Access | 2.59 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.