BTech Projects

BTech Projects http://repository.iiitd.edu.in/xmlui/handle/123456789/45 2026-07-14T19:34:11Z Visual voice activity detection using multimodal foundation models http://repository.iiitd.edu.in/xmlui/handle/123456789/1990 Visual voice activity detection using multimodal foundation models Shubham; Buduru, Arun Balaji (Advisor) This project explores the task of Visual Voice Activity Detection (VVAD) using only facial video data without access to audio. We evaluate the effectiveness of pretrained models including VideoMAE, ViViT, TimeSformer, ResNet50, as well as multimodal models like ImageBind, LanguageBind, and Video-LLaVA. Our goal is to classify whether a person is speaking in a given video segment using only visual cues. The models are tested on the VVAD-LRS3 dataset, and the results show strong promise for multimodal models even in vision-only setups. We hypothesize that large vision-language models can be adapted for explainable VVAD using prompt-based querying. 2025-07-01T00:00:00Z AI/ML in healthcare: leveraging embeddings for patient diagnosis and treatment optimization http://repository.iiitd.edu.in/xmlui/handle/123456789/1989 AI/ML in healthcare: leveraging embeddings for patient diagnosis and treatment optimization Malhotra, Chehak; Gopal, Mehak; Sethi, Tavpritesh (Advisor) This study encapsulates our progress in the integration of advanced AI models within healthcare contexts. Utilizing state-of-the-art models for new tasks, we explore their efficacy in tasks like cancer classification and shock prediction using data from clinical notes and prescriptions. Our study underscores the potential of AI to revolutionize healthcare practices and improve patient outcomes. 2024-01-01T00:00:00Z Bridging intermittent learning in rural education http://repository.iiitd.edu.in/xmlui/handle/123456789/1988 Bridging intermittent learning in rural education Kumar, Rahul Ajith; Patel, Dhyan Vimalkumar; Singh, Pushpendra (Advisor) This BTech project initially set out to explore the development of an online platform aimed at empowering Indian homemakers through digital entrepreneurship. Recognizing the socio- economic barriers—including limited digital literacy, cultural norms, and educational gaps—that hinder women’s participation in the digital economy, the project sought to provide tools and training to facilitate their engagement. However, through extensive research and collabora- tion with the NGO Sampooran Saksharta, the project’s focus pivoted towards addressing the educational needs of rural youth, particularly the challenges posed by intermittent learning. Intermittent learning, characterized by discontinuous educational experiences due to factors like inconsistent instructor availability and socio-economic constraints, significantly hampers the academic progress of rural children. The revised project centers on creating a comprehensive educational platform designed to support instructors and NGOs in managing and enhancing the learning experiences of these children. The platform aims to overcome challenges such as diverse learning levels among students and the irregularity of educational resources by developing tools for progress tracking, attendance management, and lesson planning tailored to varying student needs. By leveraging data-driven insights and user-friendly interfaces, the platform will enable instructors and NGOs to deliver more targeted and effective interventions for rural youth facing intermittent learning. This report details the transition from the initial research to the current development phase, highlighting the identification of a critical gap in rural education support systems. It also outlines the technological approach and the collaborative efforts with Sampooran Saksharta to empower rural youth through education. 2024-11-27T00:00:00Z Aligning large language models (LLMs) using curriculum learning in multilingual settings in education do-main http://repository.iiitd.edu.in/xmlui/handle/123456789/1987 Aligning large language models (LLMs) using curriculum learning in multilingual settings in education do-main Dulloo, Sushane; Shah, Rajiv Ratn (Advisor) Large Language Models (LLMs) have revolutionized natural language processing (NLP) with exceptional capabilities in reasoning and computational tasks, enabled by extensive pretraining on large datasets dominated by high-resource languages such as English and French. However, this language-specific bias significantly limits their generalizability to low-resource languages like Hindi and Bengali, which lack sufficient digital corpora and contextual representation. Conse- quently, these models struggle with scientific reasoning tasks in low-resource languages. Despite advancements in multilingual models like mBERT and XLM-R, their performance in reasoning- intensive tasks remains inadequate for these underserved languages. Addressing this disparity necessitates effective cross-lingual transfer of reasoning capabilities, augmented by data enhance- ment techniques to simulate reasoning tasks in low-resource linguistic contexts. This research aims to evaluate the reasoning performance of LLMs in low-resource language settings like Hindi/Bengali etc, develop adaptive transfer strategies, and construct LLM agent frameworks with open/close sourced LLM models to better understand reasoning steps and iteratively refine them for improved accuracy. 2024-11-27T00:00:00Z