Computer Science and Engineering

Computer Science and Engineering CSE http://repository.iiitd.edu.in/xmlui/handle/123456789/1 2026-05-05T12:41:33Z 2026-05-05T12:41:33Z Ship marine strategy database access using natural language: an application of LLM-based text-to-SQL model Ghorai, Arunoday Goyal, Vikram (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1965 2026-05-04T22:09:42Z 2024-12-01T00:00:00Z

Ship marine strategy database access using natural language: an application of LLM-based text-to-SQL model Ghorai, Arunoday; Goyal, Vikram (Advisor) The growing reliance on relational databases across industries and the ability to efficiently query and extract from a structured database has become a crucial skill in the industry. However, the Complexity of SQL Syntax creates a barrier for non-technical uses limiting their ability to interact with databases effectively. Natural Language to SQL (NL-to-SQL) query generation performs a critical task in bridging gap between non-technical users and relational databases and enables intuitive data interaction with out any need for SQL expertise. This thesis first explores various Text-to-SQL approaches, leveraging both proprietary model like Open AI’s GPT-4 and open-source models like RESDSQL, focusing on their performance across benchmark datasets like Spider, CoSQL and SPARC. Additionally, two datasets, MORD and CMEC are prepared from the real world use cases to highlight unique challenges such as hierarchical data structures, string matching operations, and privacy issues. The MORD dataset was queried using GPT-4 integrated with LangChain, to showcase natural language interaction with data and the usability of proprietary models without any tuning to domain specific dataset. Meanwhile the CMEC dataset is a privately curated dataset and access to it needs to be confidential. So we use open source models like RESDSQL that run on local server in order to minimize leakage. The dataset is pre-processed into a relational schema, and RESDSQL is fine tuned on curated NL to SQL pairs to improve performance. String matching techniques are applied to prepare better prompts in order to further enhance the results generated by the model.

2024-12-01T00:00:00Z Knowledge graph distillation Narotam, N Kaif, Mohammad Akhter, Md. Shad (Advisor) Mutharaju, Raghava (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1963 2026-04-23T22:00:08Z 2024-11-27T00:00:00Z

Knowledge graph distillation Narotam, N; Kaif, Mohammad; Akhter, Md. Shad (Advisor); Mutharaju, Raghava (Advisor) We combine domain-specific knowledge graphs with general knowledge graphs to enrich a language model. The objective of this semester was to implement baselines, test var- ious previous literature surveyed, and try to formulate what works. We develop a test hypothesis and plan to evaluate our proposed solutions. We try to investigate various tasks across domains to test our solution and its generalizability Keywords: Knowledge

2024-11-27T00:00:00Z Applications of NLP in recipe texts Neelu Vaikundam, Gurupriya Upadhyay, Rituj Bagler, Ganesh (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1962 2026-04-23T22:00:23Z 2025-07-27T00:00:00Z

Applications of NLP in recipe texts Neelu; Vaikundam, Gurupriya; Upadhyay, Rituj; Bagler, Ganesh (Advisor) This study addresses the challenge of large-scale, multi-label recipe classification us- ing a real-world dataset of over 600,000 recipes collected from heterogeneous sources. The raw data exhibited significant noise, duplication, and label imbalance, motivating a comprehensive, multi-stage cleaning and preprocessing framework. Key steps included in- gredient normalization, instructions standardization, multi-label parsing, deduplication, and semantic category mapping into hierarchical supercategories. For modeling, we im- plemented a modular pipeline combining TF-IDF feature extraction, classical classifiers, XGBoost, and fine-tuned BERT models to capture both statistical and contextual signals. By adopting a per-supercategory strategy, we minimized cross-domain interference and achieved strong performance, with the fine-tuned BERT classifier attaining a weighted F1-score of 0.7996 and high accuracy on dominant labels. This work demonstrates how rigorous data preparation and modular modeling can enable fine-grained, interpretable recipe classification at scale, providing a robust foundation for downstream culinary ap- plications such as personalized meal planning and intelligent search.

2025-07-27T00:00:00Z Consistent vision: exploring multi-domain applications of consistency models Bhagat, Amil Jain, Milind Subramanyam, A V (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1961 2026-04-23T22:00:23Z 2024-05-01T00:00:00Z

Consistent vision: exploring multi-domain applications of consistency models Bhagat, Amil; Jain, Milind; Subramanyam, A V (Advisor) This project focuses on leveraging consistency models for downstream tasks involving the trans- lation and mapping between different modalities, such as converting visible images to their corresponding infrared representations. By utilizing paired data for training, the model learns a robust mapping that preserves essential features across modalities. The ultimate goal is to build a model capable of generating accurate outputs in the target domain (e.g., infrared) from inputs in the source domain (e.g., visible), enabling practical applications in domains like imaging, vision enhancement, and modality transformation while showcasing the potential of consistency models for cross-domain learning tasks.

2024-05-01T00:00:00Z