Year-2024

Year-2024 http://repository.iiitd.edu.in/xmlui/handle/123456789/1382 Wed, 27 May 2026 21:01:16 GMT 2026-05-27T21:01:16Z Multimodal systems for scientific and educational applications http://repository.iiitd.edu.in/xmlui/handle/123456789/1770 Multimodal systems for scientific and educational applications Anand, Avinash; Shah, Rajiv Ratn (Advisor) Large Language Models (LLMs) have transformative capabilities but limited application in specialized educational and research contexts requiring multimodal reasoning, context-aware processing, and domain-specific understanding. Education and research need tools handling nuanced textual and visual interplay and context-rich language. This thesis advances LLMs in high school physics reasoning, multimodal problem solving, mathematical reasoning with bilingual understanding, student engagement analysis, grammar correction, and citation generation. The first contribution enhances multimodal reasoning in physics education, where problems combine text and diagrams. Introducing the MM-PhyQA dataset and using retrieval-augmented methods with Multi-Image Chain-of-Thought (MI-CoT), the study achieves 71.60% accuracy on complex physics tasks, improving LLM support for physics education. Next, mathematical problem-solving, especially geometry, is addressed. The GeoVQA and GPSM4K datasets enable training of LLaVA-v1.5 and G-LLaVA models, which outperform Larger LLMs in geometric reasoning benchmarks, showing the benefit of tailored LLMs for visually and linguistically challenging math tasks. The thesis also tackles student engagement prediction in online learning, lacking in-person cues. Using ECLIPSE dataset to capture virtual attention dynamics, fine-tuning CG-ViT and NeuralGaze models yields a 21.45% improvement in engagement accuracy, supporting adaptive, personalized remote education. For grammatical error correction (GEC), traditional neural machine translation methods struggle with long context. The Dynamic Context Learner (DCL) enables LLMs to integrate relevant context dynamically, improving accuracy on CoNLL 2014 and BEA-Dev datasets with F1 score gains, enhancing grammar correction for academic writing. In academic writing, accurate citation generation is vital. Existing models lack depth to capture complex citation relationships. The multi-source citation text generation (M-CTG) framework combines knowledge graphs and keyphrase embeddings with fine-tuned Vicuna and Alpaca models, achieving a 36.98% ROUGE-1 improvement, facilitating better citation and source attribution. Collectively, this thesis demonstrates the potential of multimodal LLMs fine-tuned for domain-specific educational and scientific tasks. By introducing new datasets, refining architectures, and applying innovative methods, it bridges AI application gaps across fields. In physics education, bilingual mathematical reasoning, and engagement analysis, tailored multimodal LLMs enhance reasoning and context processing. These advances show how domain-specific multimodal AI tools benefit both education and science, paving the way for precise, context-aware, impactful LLM applications across complex, cross-domain challenges. Fri, 01 Nov 2024 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/1770 2024-11-01T00:00:00Z Improving content quality for online professional activities using domain specific learning and knowledge http://repository.iiitd.edu.in/xmlui/handle/123456789/1701 Improving content quality for online professional activities using domain specific learning and knowledge Goyal, Nidhi; Kumaraguru, Ponnurangam (Advisor); Mutharaju, V. Raghava (Advisor); Sachdeva, Niharika (Advisor) Online professional platforms such as Indeed, LinkedIn, Naukri, Stack Overflow, and Blind serve as digital ecosystems to connect professionals, employers, and job seekers. These platforms witness online user activities, including finding jobs, candidate matching, and posting a job, which results in a voluminous amount of User Generated Content (UGC). UGC content primarily includes rich information about job postings, CVs, candidate posts, recruiter profiles, etc. The quality of this content varies from meaningful information to misleading content, de- pending on the expertise, reliability, and intention of the users. Even though information helps job seekers find the right jobs, the unmonitored nature of the content (including ambiguous, re- dundant, missing, off-topic, scams, misleading, or irrelevant information) makes it difficult to assess the content quality, thereby affecting the platform’s trustworthiness, reducing the value to its customers, and, in turn, hampering the user experience. For instance, the content comes with multiple variations of each entity name (e.g., ‘economictimes.com’; ‘eco. times’; ‘the economic times’; ‘economic times’; ‘ET’). These multiple non-standardized variations (noisy, redundant, and ambiguous), when directly incorporated into downstream applications such as semantic search, question answering, and recommender systems, result in poor system performance. Similarly, statistics from the Singaporean recruitment platform shows that 65% of the job de- scriptions (JDs) do not include relevant and popular skills. At the same time, 40% of JDs miss listing 20% or more explicitly stated skills in the prose description. It reduces the number of relevant applications for the job posting and affects the performance of significant recruitment tasks such as job-to-resume matching. With millions of job seekers per month, these candidates often come across dishonest, money-seeking, intentionally and verifiably false information in jobs such as offering more wages, flexible working hours, and appealing career growth opportunities. The proliferation of these jobs not only hampers candidate’s experience but also acts as a repressing factor in an enterprise’s reputation. Given the platforms are open-source and anyone ranging from novices and experts can upload content to these platforms, low-quality questions (lack of clarity, off-topic content, primarily opinion-based, too broad) also come up. Therefore, it is crucial to maintain the quality of content posted on these platforms. The thesis adopts a four-fold approach to address content quality issues for online professional platforms. The first phase of this work centres around normalizing content on online professional platforms. The second phase aims to predict missing skills to enhance job quality over these platforms. The third phase involves modeling a framework to detect misleading content on recruitment platforms. This requires mining unstructured recruitment data from various sources to obtain structured information and creating domain-specific knowledge graphs. We also delve into understanding employment scam complaints to help platforms continuously refine their advisories based on user complaint base and feedback to ensure they stay updated with the dynamically evolving tactics used by scammers. The fourth phase focuses on identifying low-quality information for question-answering services. In conclusion, we contribute by building automated solutions to improve content quality for online professional activities using domain-specific learning and knowledge. Sun, 01 Sep 2024 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/1701 2024-09-01T00:00:00Z Modeling online user interactions and their offline effects on socio-technical platforms http://repository.iiitd.edu.in/xmlui/handle/123456789/1651 Modeling online user interactions and their offline effects on socio-technical platforms Hitkul; Shah, Rajiv Ratn (Advisor); Kumaraguru, Ponnurangam (Advisor) Do online interactions trigger reactions back in the offline world? How can these reactions be detected and quantified? Specifically, what insights can be extracted for users, platform owners, and policymakers to minimize the potential harm of such reactions? Society functions based on the complex interactions between individuals, communities, and organizations. We communicate with each other to build family, friendship, and romantic relationships; to seek or provide advice and education; to execute trade and commerce. People unite to form organizations that drive economic activity, govern states, and provide social benefits. The advent of the Internet has enabled these interactions to move online. A website or an application that facilitates the digitization of social interactions is called a socio-technical platform. For instance, individuals converse with each other via direct messaging applications (e.g., WhatsApp, Telegram), share thoughts, and gather feedback from communities (e.g., Reddit, Twitter, Youtube). Trade of goods occurs via e-commerce (e.g., Flipkart, Amazon) and online marketplaces (e.g., Google Play store). At times interactions happening in the online world, trigger reactions in the offline world, which we call overflow. Such overflows can have either a positive or negative impact. Socio-technical platforms save every interaction and associated metadata, providing a unique opportunity to analyze rich data at scale. Discover interaction patterns, detect and quantify overflow of interactions, and extract insights for users and policymakers. This thesis aims to study the interactions by keeping the individual as the focal point. We focus on three broad forms of interactions - i) the effect online community feedback can have on individual offline actions, ii) organizations leveraging individual customers’ online presence to optimize business processes, and iii) how data from tracking platforms can be used to uncover the strategies behind successful users. In the first part, we work on three scenarios - (a) How does community feedback affect an individual future drug consumption frequency in a drug community forum?; (b) What changes does an individual undergo immediately after getting sudden popularity in Online social media? What actions help in maintaining popularity for longer?; (c) Dynamics of interactions in an online COVID-19 support group and what affects a user’s longevity in the community. In the second part, we leverage online information about a user to improve the prediction of Return-to-Origin 1 orders in the e-commerce platform. Finally, in the third part, we leverage data from a habittracking platform to unveil what user actions lead to success in habit-building pursuits. Fri, 01 Mar 2024 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/1651 2024-03-01T00:00:00Z Overcoming biases and injuries : tackling challenges in facial analytic systems http://repository.iiitd.edu.in/xmlui/handle/123456789/1650 Overcoming biases and injuries : tackling challenges in facial analytic systems Majumdar, Puspita; Singh, Richa (Advisor); Vatsa, Mayank (Advisor) The remarkable achievements and the robust performance of deep models have been instrumental in the evolution of facial analysis systems. Employed across a wide array of applications, these systems assist in making crucial decisions and predicting important results. However, the reliability of the outcomes produced by these systems continues to be a subject of uncertainty and debate. For instance, there have been numerous instances where deep models have demonstrated varying performance levels across different demographic groups. Models that exhibit high accuracy with individuals with lighter skin tones have shown diminished performance on those with darker skin tones. In addition to this, deep models have also displayed inconsistency across various settings. Facial recognition models that perform exceptionally well on unaltered faces tend to falter when dealing with manipulated faces. We observe that the anomalous and inconsistent performance of facial analytic systems can be attributed to two major problems: (i) bias and (ii) robustness. Bias in predictions emerges from issues like domain shift and imbalanced data distribution, while challenges in robustness stem from both unintentional and intentional variations. This dissertation takes on the challenge of addressing bias resulting from skewed data distribution and offers solutions to enhance the robustness of deep models against unintentional variations due to injuries. We begin our research with an examination of the bias resulting from traditional methods of model training, specifically when training is conducted on datasets characterized by imbalanced distributions. Typically, the conventional training methods prioritize optimizing the model to attain elevated levels of classification accuracy, yet they tend to neglect the performance variances across less represented demographic subcategories (such as male and female under the broader gender category). This oversight can culminate in biased outcomes. To address this, we introduce a novel loss function, termed as Uniform Misclassification Loss (UML), aiming for equitable results when training deep models. The UML function directs the model’s focus towards the subgroup that is performing the worst during training, striving to minimize and balance the misclassification rate across all subgroups. This approach not only mitigates bias but also enhances the overall performance of the model. The UML function relies on prior knowledge of demographic subgroups (referred to as protected attributes). However, there are instances where information on protected attributes is unavailable due to privacy or legal constraints, making bias mitigation challenging. To address this issue, we propose a unique algorithm, Non-Protected Attribute-based Debiasing (NPAD). This algorithm leverages auxiliary information from non-protected attributes to counteract bias, intelligently selecting non-protected attributes to align the model with fairness objectives. For optimizing the model, we introduce Debiasing via Attribute Cluster Loss (DACL) and Filter Redundancy Loss (FRL) functions. DACL guides the model to assimilate class-specific information for reducing bias, while FRL enhances model performance by encouraging the learning of non-redundant features, resulting in unbiased predictions. Next, we shift our focus to mitigating bias in the predictions made by pre-trained models. A variety of highly efficient pre-trained models, widely applied in numerous tasks, have exhibited biased tendencies towards specific groups. In order to maintain the utility of these models while ensuring equitable results, it is crucial to neutralize the bias in their predictions. To address this challenge, we present a novel algorithm aimed at learning a consistent perturbation, referred to as Subgroup Invariant Perturbation (SIP), tailored for a particular dataset. The addition of the learned SIP to the input dataset results in a transformed dataset, which, when fed into a pre-trained model, yields unbiased results. This algorithm is grounded in adversarial perturbations and negates the need for updating model parameters, rendering it computationally efficient. Beyond addressing bias, we also address the robustness challenges faced by face recognition models due to unintentional variations. Despite significant strides in face recognition technology, challenges persist, particularly when dealing with input images that include facial injuries. Face recognition models, typically trained on images of uninjured faces, exhibit a marked decrease in performance when applied to images of injured faces. Injuries alter facial features and overall appearance, complicating recognition by automated systems. To address this issue, we first compiled an Injured Face (IF) dataset, consisting of 150 subjects, each represented by images in both injured and non-injured states. Following this, we introduced a novel loss function, Subclass Injured Face Identification (SCIFI) loss, specifically designed for recognizing injured faces. This loss function categorizes injured and non-injured images into two separate subclasses, operating in a 2-dimensional score space derived from both injured and non-injured images. The goal is to optimize this subclass space to maximize inter-class separation while maintaining uniform distance between the feature representations of samples from different subjects, as well as a consistent distance between samples from the same subject. The extensive evaluations across multiple scenarios verify the efficacy of the SCIFI loss, consistently outperforming existing algorithms and showcasing enhanced performance. Sat, 01 Jun 2024 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/1650 2024-06-01T00:00:00Z