DSpace Collection: Year-2025

DSpace Collection: Year-2025 http://repository.iiitd.edu.in/xmlui/handle/123456789/1714 Year-2025 2026-07-24T07:29:30Z In silico approaches for biomolecule-driven disease diagnosis and therapy http://repository.iiitd.edu.in/xmlui/handle/123456789/1834 Title: In silico approaches for biomolecule-driven disease diagnosis and therapy Authors: Tomer, Ritu; Raghava, Gajendra Pal Singh (Advisor) Abstract: Developments in computational biology have facilitated the systematic study of biomolecules including peptides, proteins or nucleic acids. Such advancements have provided disease-oriented research and therapeutic discovery opportunities. The thesis deals with two broad disciplines in this field: Disease Diagnosis and Biomolecule-Based Therapeutics, with a particular focus on integrating curated data resources and machine-learning-based predictive systems. The initial part is aimed at enhancing the molecular knowledge and diagnostic studies of mucormycosis, a serious and rapidly spreading fungal infestation also referred to as “Black Fungus”. Here, we have a developed a web-repository titled as “MucormyDB”. The repository is a compilation of genomic, proteomic, virulence and therapeutic information. This repository enables researchers to easily perform comparative analyses and allows the identification of possible genetic biomarkers. With such capabilities MucormyDB is a valuable resource to study the molecular basis of mucormycosis. The second part of the thesis provides computational models for peptide-based therapeutics prediction. This involves IL4pred2, a machine-learning model that predicts peptides that can induce interleukin-4, and the AntiCP4, which predicts anticancer peptides with enhanced predictive capability. In addition to therapeutic prediction, identification of safety of peptide candidates is also taken into account in the thesis. In order to deal with this aspect, we have designed RAIpred to detect peptides that could trigger rheumatoid arthritis. We have also developed CDpred to predict peptides related to the celiac disease. These tests provide a valuable adjunct for immunological risk determination. They help researchers to select safer peptide candidates in the early phase of drug discovery early. Together, the resources developed in this thesis present high-quality molecular data, powerful predictive algorithms, and easily available computing platforms that can be used to study mucormycosis and complementarily improve the systematic evaluation of therapeutic peptides and the potential immunological consequences of peptides. 2025-10-01T00:00:00Z Development of in silico tools for advancing non-invasive diagnostics and therapeutics http://repository.iiitd.edu.in/xmlui/handle/123456789/1782 Title: Development of in silico tools for advancing non-invasive diagnostics and therapeutics Authors: Akanksha; Raghava, Gajendra Pal Singh (Advisor) Abstract: Non-invasive diagnostics and therapeutics have the potential to revolutionize clinical practices by providing patient-friendly, accessible, and pain-free alternatives to traditional invasive methods. Among various biofluids, saliva stands out due to its easy collection and abundance of biomarkers that are reflective of systemic health. To accelerate the research on saliva-based non-invasive diagnostics and therapeutics, we built SalivaDB, a comprehensive database compiling 15,821 biomarker entries from various sources like research articles and databases. It contains information on about 201 diseases for biomarker categories like proteins, metabolites, microbes, miRNAs, and genes. SalivaDB is a user-friendly web-based platform, helping researchers by reducing research effort and time required for the discovery and validation of clinically relevant salivary biomarkers. One significant pathway through which these biomarkers enter saliva is via exosomes, which are secreted by approximately all cell types. Exosomes carry biomolecules like RNAs, proteins, and lipids, making them highly promising for early disease detection and targeted therapeutic delivery. To understand the role of exosomal biomarkers in diagnostics and therapeutics, we developed computational tools to predict major exosomal molecules. We first developed ExoProPred, which is a tool for predicting exosomal proteins. We initially applied a similarity- search-based method using BLAST, which was ineffective due to low sequence similarity among exosomal proteins. Subsequently, we applied a motif-based approach which revealed recurrent motifs that were unique to exosomal proteins. Although this method showed high accuracy, it had limited coverage. We then employed machine learning models using compositional and evolutionary features, achieving an AUROC of 0.73. To further enhance prediction performance, we developed a hybrid approach that integrated Machine Learning (ML) with motif analysis, resulting in a significantly improved AUROC of 0.85. After predicting exosomal proteins, we focused on miRNA which is another most commonly found biomarker in exosomes. We developed EmiRPred which is a computational tool to predict exosomal miRNAs. We built AI-based models (ML, DL, LLM) using a number of features including composition features, binary features, structural features, and structural images. We integrated these AI-based models with alignment-based approaches like (motif- search and similarity-search) to effectively predict exosomal miRNAs. This integrated ensemble model achieved an AUC of 0.73 on independent validation set. The tools ExoProPred and EmiRPred are accessible via web server, downloadable standalone, and python packages, supporting broad usability within the research community. In addition to predicting exosome-associated biomolecules, we also aimed to predict the molecules that are highly expressed in exosomes. It is observed that expression levels of miRNA varied significantly amongst different subcellular locations and also changed notably in the presence of disease. The identification of abundant miRNAs in exosomes is crucial for exploring their physiological roles and potential implications in disease diagnostics and therapeutics. Therefore, to provide deeper insights into baseline miRNA profiles in exosomes, we developed AdmirePred, a prediction tool to identify highly abundant miRNAs within blood exosomes. In this study, we used alignment-based methods like motif-search and similarity- search and alignment-free methods like machine learning algorithms. We leveraged a combination of both these approaches to develop a hybrid method that achieved an AUC of 0.854 on an independent validation set. AdmirePred is available as a standalone software, web server and a Python package. To demonstrate the practical utility of extracellular saliva-based biomarkers, we investigated extracellular RNA biomarkers in saliva for diagnosing Gastric Cancer (GC). Extracellular RNA (exRNA) originates from cells and is released into body fluids through active secretion via extracellular vesicles (exosomes) or passive release during cell death. We identified different sets of biomarkers including primary and secondary biomarkers in this study. The best performing set of features was an eight-gene biomarker panel with a high validation AUC of 0.905 and an MCC of 0.770. The biomarker panel identified in our study performed better than previously discussed biomarkers in the literature. These findings underscore saliva’s significant potential as a reliable and efficient diagnostic fluid for early and accurate GC detection. Overall, we present a wide-ranging collection of curated resources, computational tools, and methods that collectively aim to advance non-invasive diagnostics and therapeutics. These tools and findings are designed to support the transition from traditional invasive procedures to more accessible, efficient, and patient-friendly diagnostic approaches. 2025-09-01T00:00:00Z Integrative computational frameworks for GPCR biology: from receptor logic to functional modulation http://repository.iiitd.edu.in/xmlui/handle/123456789/1779 Title: Integrative computational frameworks for GPCR biology: from receptor logic to functional modulation Authors: Mohanty, Sanjay Kumar; Ahuja, Gaurav (Advisor) Abstract: G-protein-coupled receptors (GPCRs) are vital pharmaceutical targets, with more than one-third of FDA-approved drugs influencing their function. While central to cellular signaling and drug development, GPCR research is hindered by various challenges. This thesis introduces several innovative tools and algorithms designed to deepen our understanding of GPCR biology. First, Reverse Cell Tracking (RCT), a novel computational framework that leverages RNA velocity embeddings to trace gene expression trajectories during cellular differentiation. Applying RCT to investigate odorant receptor (OR) gene expression during neuronal development, we uncovered insights into OR gene choice mechanisms. ORs, a subset of GPCRs traditionally associated with smell, are also expressed in non-olfactory tissues, including cancers, implicating them in processes such as migration, proliferation, and immune modulation. Their expression follows a unique "one neuron, one receptor" rule, driven by mutual exclusivity and monoallelic expression. However, recent single-cell studies have revealed co-expression of multiple ORs in immature neurons, suggesting alternative models such as winner-takes-all or stochastic selection. RCT analysis revealed a bias toward the most highly expressed OR during differentiation, offering potential breakthroughs in understanding OR expression patterns and could open up new avenues for diagnostics and therapeutic targeting outside the nose, especially in diseases like cancer, where altered GPCR signaling plays a critical role. Second, Machine-OlF-Action (MOA), a user-friendly, open-source computational framework designed to support GPCR researchers with minimal programming experience. As GPCR signaling gains prominence, there is a growing demand for accessible tools to efficiently explore and model GPCR-ligand interactions. While machine learning-based techniques are emerging as state-of-the-art approaches in chemoinformatics, enabling selective, effective, and rapid identification of biologically relevant molecules from vast chemical databases, their broader adoption in GPCR research has been limited due to their reliance on advanced computational skills, as well as the technical complexity of existing tools. MOA bridges this gap by allowing users to input SMILES strings and known activation statuses of compounds to build reliable classification models. By simplifying complex machine learning workflows into an accessible platform, MOA enables even researchers without a deep computational background to uncover meaningful GPCR-ligand relationships and advance the field of chemosensory biology. Third, Gcoupler, an AI-driven computational toolkit that combines de novo ligand design, advanced statistical approaches, Graph Neural Networks, and bioactivity-based prioritization to facilitate the unbiased identification of druggable surface cavities and the rational prediction of high-affinity ligands. While conventional GPCR-targeted therapies predominantly focus on orthosteric sites, emerging research highlights the therapeutic potential of allosteric sites. Despite the development of synthetic allosteric modulators, endogenous intracellular modulators remain largely unexplored due to a lack of comprehensive binding and phenotypic data. This data scarcity limits the applicability of traditional machine learning approaches. Gcoupler addresses this challenge by enabling cavity-specific predictions and ligand identification even in data-scarce GPCR contexts, paving the way for more targeted and effective drug discovery. This research introduces a suite of computational frameworks tailored to advance GPCR-targeted drug discovery by addressing key bottlenecks in modeling, data scarcity, and accessibility. These findings challenge the conventional view of OR expression and provide fresh insights into their functional roles beyond the olfactory system. By simplifying complex workflows and integrating AI-driven methods, these tools democratize computational biology for researchers with limited coding expertise. Collectively, they enhance the understanding of chemosensory GPCRs, enable unbiased ligand prioritization, and offer new strategies to tackle data-scarce targets, ultimately accelerating the development of selective and effective therapeutics. 2025-05-01T00:00:00Z Unveiling the structural determinants for enhanced specificity in CRISPR/Cas9 for genome editing http://repository.iiitd.edu.in/xmlui/handle/123456789/1777 Title: Unveiling the structural determinants for enhanced specificity in CRISPR/Cas9 for genome editing Authors: Panda, Gayatri; Ray, Arjun (Advisor) Abstract: CRISPR/Cas, a recently discovered genome-editing method depends on a single protein (Cas9) and non-coding RNA for gene-editing, which makes it simple, more rapid, versatile, efficient and manipulatable. Despite the high-efficiency and user-friendliness of CRISPR/Cas9, its applications are limited by various factors which includes large size of Cas9 (difficult in delivery), recognition to a specific PAM (limiting its effectiveness), differential specificity and sensitivity of Cas variants and introduction of random off target mutations at sequences similar to those of target genes, major concern. An ortholog of Cas9 from Francisella novicida (FnCas9), was shown to have very low non-specific editing compared to SpCas9. The cleavage and recognition mechanism of FnCas9 as well as the reasons for its increased specificity have not yet been fully investigated, whereas SpCas9 (derived from Streptococcus pyogenes bacteria) has been the subject of much research. Questions pertaining to the molecular interplay for the interactions and a comparison between the two orthologs can provide us insights into the elusive future of designing customized Cas9 molecules. This study provides a comprehensive examination of the structural and dynamic characteristics of SpCas9 and FnCas9, two prominent CRISPR/Cas9 orthologs, to elucidate the molecular mechanisms that govern their specificity, efficiency, and substrate versatility in genome editing. Utilizing molecular dynamics (MD) simulations, we compared the apo and gRNA-bound states of these proteins, revealing distinct conformational shifts and interactions that significantly influence gRNA binding and cleavage activity. FnCas9 demonstrated superior stability in gRNA binding attributed to dynamic domain rearrangements and specific residue interactions, particularly in the bridge-helix and REC3 domain. To further investigate specificity, accelerated MD simulations combined with machine learning techniques were employed to analyze RNA:DNA hybrid mismatches, revealing that FnCas9 maintains structural integrity in the presence of PAM-distal mismatches, unlike SpCas9. Enhanced interactions within the REC3 domain and allosteric communication pathways bypassing the REC2 domain were identified as critical factors in base-pair mismatch recognition and efficient gene-editing. Additionally, structural analyses of engineered FnCas9 variants (en1, en15, en31) highlighted how specific mutations and domain modifications impact cleavage efficiency and specificity. The en31 variant exhibited distinct domain dynamics supporting better base-pair mismatch discrimination and adaptability for broader genome-editing targets due to its. Finally, the study compared FnCas9’s interaction with RNA (tRNA) and DNA (tDNA) substrates, revealing unique interaction networks that suggest differing binding dynamics. While tDNA (target DNA) showed stronger binding affinity overall, tRNA (target RNA) binding induced greater conformational flexibility in key domains. These findings advance our understanding of the structural determinants of Cas9 function and provide actionable insights for engineering high-fidelity Cas9 variants with enhanced precision and substrate compatibility. Future directions include mechanistic studies on DNA-bound ternary complexes and the development of optimized Cas9 systems for RNA editing applications, contributing to the ongoing evolution of genome editing technologies. 2025-08-01T00:00:00Z