<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>MTech Theses</title>
<link href="http://repository.iiitd.edu.in/xmlui/handle/123456789/1249" rel="alternate"/>
<subtitle/>
<id>http://repository.iiitd.edu.in/xmlui/handle/123456789/1249</id>
<updated>2026-04-10T22:01:08Z</updated>
<dc:date>2026-04-10T22:01:08Z</dc:date>
<entry>
<title>Inferring crucial pathways needed for differentiation of stem cells to required lineage</title>
<link href="http://repository.iiitd.edu.in/xmlui/handle/123456789/1863" rel="alternate"/>
<author>
<name>A, Haseena</name>
</author>
<author>
<name>Kumar, Vibhor (Advisor)</name>
</author>
<id>http://repository.iiitd.edu.in/xmlui/handle/123456789/1863</id>
<updated>2026-04-09T22:26:41Z</updated>
<published>2025-06-18T00:00:00Z</published>
<summary type="text">Inferring crucial pathways needed for differentiation of stem cells to required lineage
A, Haseena; Kumar, Vibhor (Advisor)
Regenerative medicine relies on the precise control of stem cell differentiation. While mesenchymal stem cells (MSCs) and human embryonic stem cells (hESCs) hold great promise, current differentiation methods struggle with efficiency, reproducibility, and a limited understanding of complex regulatory networks. Traditional genetic modification often yields unpredictable outcomes, and wet-lab methods are time and resource-intensive. This thesis presents a novel computational framework that systematically guides stem cell differentiation towards specific lineages without genetic modification. By integrating single-cell RNA sequencing (scRNA-seq) and RNA velocity, the framework estimates the "poising levels" of MSCs and hESCs by capturing gene expression dynamics. Pathway enrichment scores from UniPath (a normalization-free gene-set enrichment tool) are combined with probabilistic graphical models to identify key signaling pathways influencing lineage decisions. A unique feature includes modeling bifurcations using relative RNA velocities of marker genes, enabling a pathway-centric view that accounts for cell variability. We applied this framework to analyze human gastrulation using public scRNA-seq datasets, mapping developmental trajectories and identifying critical pathways (e.g., Wnt, BMP, TGFβ, FGF, Retinoic Acid) and transcription factors (e.g., ZSCAN10, STAT3, OTX2, SOX5, RUNX2) involved in ectoderm, mesoderm, and endoderm differentiation. The framework also revealed regulatory networks in endoderm-derived liver/pancreas and MSC-derived adipocyte, cartilage, and osteocyte differentiation. Bayesian Network inference and Random Forest analysis uncovered causal links between pathway activities and cell fates. Consistency with established developmental biology supports the validity of our computational predictions. This work offers a scalable and reproducible approach for stem cell engineering, advancing regenerative medicine.
</summary>
<dc:date>2025-06-18T00:00:00Z</dc:date>
</entry>
<entry>
<title>Deciphering putatively transmembrane and secreted human gut microbial prolyl endopeptidases: therapeutic implications for celiac disease</title>
<link href="http://repository.iiitd.edu.in/xmlui/handle/123456789/1851" rel="alternate"/>
<author>
<name>Bai O P, Kavya</name>
</author>
<author>
<name>Ghosh, Tarini Shankar (Advisor)</name>
</author>
<id>http://repository.iiitd.edu.in/xmlui/handle/123456789/1851</id>
<updated>2026-04-07T23:11:43Z</updated>
<published>2025-07-24T00:00:00Z</published>
<summary type="text">Deciphering putatively transmembrane and secreted human gut microbial prolyl endopeptidases: therapeutic implications for celiac disease
Bai O P, Kavya; Ghosh, Tarini Shankar (Advisor)
Celiac disease (CeD) is a chronic immune-mediated enteropathy triggered by the ingestion of gluten, a storage protein complex found in wheat, rye, and barley. Gluten comprises glutenin and gliadins, which are rich in proline and glutamine residues. The high proline content makes these proteins resistant to degradation by human gastrointestinal and brush-border proteases. While a strict gluten-free diet is currently the primary treatment for CeD, maintaining such a diet is often challenging and can significantly impact patients' quality of life. Therefore, alternative or supplementary therapeutic approaches are urgently needed. Certain gut-associated microbes have been reported to produce Prolyl Endopeptidases (PEPs), which can hydrolyze gluten peptides. In this study, we screened 10,903 bacterial PEPs from diverse taxa including Firmicutes, Bacteroidetes, Proteobacteria, Cyanobacteria, Fungi, and Animals. Using HMMER, these sequences were compared against curated reference databases comprising 115,692 transmembrane proteins and 40,344 secreted proteins from 169 gut-associated species. This analysis identified 372 transmembrane and 1,250 secreted PEP homologs. Among them, 31 transmembrane and 36 secreted PEPs from animal-associated species were selected for further analysis due to their closer similarity to human enzymes. The PEP from Sphingomonas capsulata—a component of the oral enzyme therapy latiglutenase—was used as a reference. Through comparative sequence and domain analysis, structural modeling, and molecular docking, ten PEPs (4 transmembrane and 6 secreted) were identified with high structural and functional similarity. Docking studies showed strong binding affinity to gluten peptides, with scores ranging from −7.2 to −10.2 kcal/mol. Furthermore, a prevalence analysis using five publicly available gut microbiome datasets comprising 414 samples (CeD and controls) revealed that the significant species harboring these enzymes were notably less abundant in CeD patients. This suggests a potential association between disease progression and the depletion of gluten-degrading microbial species. Together, our findings highlight a set of microbial enzymes with substantial gluten- degrading potential, which are underrepresented in individuals with Celiac disease. These enzymes present promising candidates for the development of next-generation enzyme-based therapies for CeD management.
</summary>
<dc:date>2025-07-24T00:00:00Z</dc:date>
</entry>
<entry>
<title>Computational methods for cell-free DNA based diagnostics</title>
<link href="http://repository.iiitd.edu.in/xmlui/handle/123456789/1850" rel="alternate"/>
<author>
<name>Siraj, Mariyam</name>
</author>
<author>
<name>Kumar, Vibhor (Advisor)</name>
</author>
<id>http://repository.iiitd.edu.in/xmlui/handle/123456789/1850</id>
<updated>2026-04-07T22:34:57Z</updated>
<published>2025-07-01T00:00:00Z</published>
<summary type="text">Computational methods for cell-free DNA based diagnostics
Siraj, Mariyam; Kumar, Vibhor (Advisor)
Cell-free DNA (cfDNA) has emerged as a promising biomarker for non-invasive cancer diagnostics, offering a window into tumor-derived genetic and epigenetic information through simple blood sampling. However, the biological signals embedded in cfDNA, such as end motifs and nucleosome positioning, remain underutilized compared to conventional mutation-based assays. This thesis develops and validates computational frameworks that harness fragmentomic and chromatin features of cfDNA to enhance cancer detection, tissue-of-origin classification, and cross-platform model generalizability. Firstly, the study demonstrates that short sequence motifs at cfDNA fragment ends, particularly 6-mer patterns encode robust cancer-specific signatures. By applying machine learning classifiers to these motifs, the models achieved high accuracy in distinguishing cancer patients from healthy individuals and exhibited strong performance in differentiating among multiple cancer types. Secondly, to address platform variability and limited sample sizes, a model-based transfer learning strategy was implemented. Using decision tree adaptation techniques (SER and STRUT, with class imbalance-aware variants), models trained on Illumina sequencing platform were successfully transferred to data generated by alternative sequencing platforms, such as Nanopore sequencing, improving cross-domain prediction without retraining from scratch. Thirdly, this work investigates nucleosome occupancy patterns around transcription factor binding sites as an informative layer for tissue-of-origin inference. A custom computational pipeline quantified nucleosome positioning and chromatin accessibility signatures, which, when integrated with machine learning and feature reduction, provided accurate tumor-type classification and revealed biologically interpretable chromatin features relevant to cancer progression. Collectively, this thesis advances the field of cfDNA diagnostics by demonstrating that shallow, cost-effective sequencing combined with robust computational pipelines can deliver high diagnostic accuracy, interpretability, and adaptability across sequencing technologies. The developed approaches lay the groundwork for scalable, minimally invasive multi-cancer early detection and tissue-specific monitoring, supporting future implementation of cfDNA-based liquid biopsies in precision oncology.
</summary>
<dc:date>2025-07-01T00:00:00Z</dc:date>
</entry>
<entry>
<title>Adaptive biomedical knowledge graph querying: orchestrating multi-agent AI for complex data retrieval</title>
<link href="http://repository.iiitd.edu.in/xmlui/handle/123456789/1849" rel="alternate"/>
<author>
<name>Kharkwal, Rahul</name>
</author>
<author>
<name>Sengupta, Debarka (Advisor)</name>
</author>
<id>http://repository.iiitd.edu.in/xmlui/handle/123456789/1849</id>
<updated>2026-04-07T22:34:45Z</updated>
<published>2025-08-20T00:00:00Z</published>
<summary type="text">Adaptive biomedical knowledge graph querying: orchestrating multi-agent AI for complex data retrieval
Kharkwal, Rahul; Sengupta, Debarka (Advisor)
This thesis explores the development and evaluation of a smart, multi-agent system designed to retrieve information efficiently from a diverse biomedical knowledge graph. The graph was carefully built using BioKG data in TSV format and structured in Neo4j, with special attention given to properly representing multi-valued properties as lists. Early attempts to use embedding-based models—such as Nomic Atlas v1, BioBERT, and PubMedBERT—for semantic search presented several obstacles. The main issues stemmed from the highly varied nature of the data, repeated terms that introduced bias, and the models’ limited ability to process structured key-value information effectively. Due to these limitations, a BM25 retriever was initially used for keyword-based node extraction. While it served as a practical starting point, its dependency on exact keyword matches proved restrictive. To address these shortcomings and enhance retrieval accuracy, a layered multi-agent system was built using the LangChain supervisor agent framework, with GPT-4o Mini at its core. This system includes several specialized agents: one for handling query expansion and rewriting (including web search via Tavily), another for initial node retrieval using BM25, and a graph traversal agent that navigates the graph using Cypher queries and generates comprehensive responses. Together, these components form a robust solution for querying complex biomedical datasets. The system not only improves over basic retrieval methods but also illustrates the potential of agent-based architectures in exploring large, heterogeneous knowledge sources.
</summary>
<dc:date>2025-08-20T00:00:00Z</dc:date>
</entry>
</feed>
