PhD Theses

PhD Theses http://repository.iiitd.edu.in/xmlui/handle/123456789/1244 2026-05-20T03:19:34Z Mechanism-informed, AI-driven frameworks for discovery and validation of aging-associated chemical space http://repository.iiitd.edu.in/xmlui/handle/123456789/1964 Mechanism-informed, AI-driven frameworks for discovery and validation of aging-associated chemical space Arora, Sakshi; Ahuja, Gaurav (Advisor) Aging is a progressive, multifactorial biological process that drives the risk of nearly all major chronic diseases, including cancer, neurodegeneration, metabolic disorders, frailty, and cardiovascular dysfunction. Although the past two decades have established a molecular framework through the Hallmarks of Aging, translating this knowledge into actionable, small-molecule interventions that enhance healthspan remains a central challenge in the field of geroscience. Experimental discovery pipelines are slow, resource-intensive, and typically explore only a minute fraction of chemical space. Conversely, computational drug discovery approaches, while high-throughput, often rely on chemistry-centric descriptors, exhibit black-box behavior, lack mechanistic interpretability, and rarely generalize to biologically novel molecules. This thesis addresses these long-standing limitations by developing two complementary artificial intelligence systems, AgeXtend and AgeXtend:: Mimetics, designed to accelerate mechanism-informed discovery of geroprotective molecules and caloric restriction mimetics (CRMs). The first objective introduces AgeXtend, a multimodal, bioactivity-driven, and fully explainable AI framework. AgeXtend integrates curated datasets of experimentally validated geroprotectors and neutral compounds with bioactivity-based descriptors, hallmark-specific classification models, toxicity prediction, and target inference modules. By combining mechanistic knowledge with machine learning, AgeXtend achieves robust predictive accuracy across cross-validation, leave-one-out validation, and independent external datasets. Importantly, the explainability module maps predictions onto nine aging pathways, allowing for a mechanistic interpretation of each compound’s mode of action. Large-scale screening of ~1.1 billion compounds yielded diverse chemical classes with strong geroprotective potential. Experimental validation confirmed these predictions across three biological systems: Saccharomyces cerevisiae chronological lifespan assays, human fibroblast senescence assays, and Celegans lifespan assays. Endogenous metabolites and repurposed drugs predicted by AgeXtend demonstrated lifespan-extending or senomodulatory activity, underscoring the biological fidelity of its predictions. Building upon this foundation, the second objective presents AgeXtend::Mimetics, a novel computational framework designed to identify Caloric Restriction Mimetics, compounds capable of reproducing CR-like physiological responses without structural similarity to known CRMs. Unlike existing approaches that rely on transcriptomic signatures alone or structural matching, AgeXtend::Mimetics explicitly decouples biological convergence from chemical divergence. Using dual similarity modeling, ridge regression residuals, supervised contrastive learning, and composite CRM fingerprinting, the framework identifies molecules whose biological signatures align strongly with known CRMs despite having distinct chemical architectures. Large-scale application across thousands of compounds revealed chemically novel, mechanistically plausible CRM candidates that align with pathways such as nutrient sensing, autophagy, mitochondrial remodeling, and metabolic regulation. This framework substantially broadens the chemical landscape of CRM discovery and provides mechanistic clarity on CRM-like effects. Together, the approaches developed in this thesis demonstrate that explainable, mechanism-oriented AI models can successfully bridge the gap between large-scale chemical exploration and biological relevance. AgeXtend and AgeXtend::Mimetics collectively advance the field of computational geroscience by enabling scalable, interpretable, and experimentally validated discovery of geroprotectors and CRMs. These contributions lay the groundwork for future translational studies, the development of generative design for longevity therapeutics, and the integration of multi-omic datasets to refine mechanism-based discovery pipelines. The thesis highlights both the promise and current limitations of AI in aging biology, providing a roadmap for next-generation computational frameworks that target healthspan extension. 2026-03-01T00:00:00Z In silico approaches for biomolecule-driven disease diagnosis and therapy http://repository.iiitd.edu.in/xmlui/handle/123456789/1834 In silico approaches for biomolecule-driven disease diagnosis and therapy Tomer, Ritu; Raghava, Gajendra Pal Singh (Advisor) Developments in computational biology have facilitated the systematic study of biomolecules including peptides, proteins or nucleic acids. Such advancements have provided disease-oriented research and therapeutic discovery opportunities. The thesis deals with two broad disciplines in this field: Disease Diagnosis and Biomolecule-Based Therapeutics, with a particular focus on integrating curated data resources and machine-learning-based predictive systems. The initial part is aimed at enhancing the molecular knowledge and diagnostic studies of mucormycosis, a serious and rapidly spreading fungal infestation also referred to as “Black Fungus”. Here, we have a developed a web-repository titled as “MucormyDB”. The repository is a compilation of genomic, proteomic, virulence and therapeutic information. This repository enables researchers to easily perform comparative analyses and allows the identification of possible genetic biomarkers. With such capabilities MucormyDB is a valuable resource to study the molecular basis of mucormycosis. The second part of the thesis provides computational models for peptide-based therapeutics prediction. This involves IL4pred2, a machine-learning model that predicts peptides that can induce interleukin-4, and the AntiCP4, which predicts anticancer peptides with enhanced predictive capability. In addition to therapeutic prediction, identification of safety of peptide candidates is also taken into account in the thesis. In order to deal with this aspect, we have designed RAIpred to detect peptides that could trigger rheumatoid arthritis. We have also developed CDpred to predict peptides related to the celiac disease. These tests provide a valuable adjunct for immunological risk determination. They help researchers to select safer peptide candidates in the early phase of drug discovery early. Together, the resources developed in this thesis present high-quality molecular data, powerful predictive algorithms, and easily available computing platforms that can be used to study mucormycosis and complementarily improve the systematic evaluation of therapeutic peptides and the potential immunological consequences of peptides. 2025-10-01T00:00:00Z Elucidating the mechanism of lipid exchange by cholesteryl ester transfer protein (CETP) http://repository.iiitd.edu.in/xmlui/handle/123456789/1819 Elucidating the mechanism of lipid exchange by cholesteryl ester transfer protein (CETP) Sacher, Sukriti; Ray, Arjun (Advisor) Cholesteryl Ester Transfer Protein (CETP) exchanges cholesteryl esters (CEs) and triglycerides (TGs) between lipoproteins, modulating their composition. This process regulates plasma CE and TG content and therefore strongly correlates with cardiovascular disease (CVD) risk factors. While CETP is a significant therapeutic target, its mechanism of lipid transfer, particularly the role of its structural components, such as its cavity and associated phospholipids, remains incompletely understood. Identification of molecular determinants that can influence CETP activity is paramount for designing and evaluating next-generation therapeutics. Therefore, in this work, we have employed advanced molecular dynamics simulations to track the movement of lipids through CETP, highlighting the regulatory interactions that influence their lipid transfer efficiency and specificity. CETP encloses a tunnel that serves as a medium for lipid exchange; the movement of lipids can only be understood post identification and characterization of this tunnel. Although several methods exist for protein cavity identification, they are generally limited by the type of cavity they can identify, as well as the automation and resolution of their cavity detection. We addressed these limitations of existing methods by developing CICLOP (Characterization of Inner Cavity Lining of Proteins), a new method utilizing a hybrid grid- and tessellation-based approach to identify internal cavities (tunnels, channels, pores, and voids). CICLOP offers superior performance and automation, accurately identifying the protein cavity and its functional characterization (diameter, volume, hydrophobicity, charge, and conservation). Applying CICLOP, we identified a single, continuous hydrophobic tunnel within CETP that contradicted earlier models that hypothesized diffused, smaller cavities within CETP. Next, we determined the lipid transfer mechanism using steered MD simulations. The CETP cavity, in addition to its terminal openings, has two additional openings, which are plugged by two phospholipids (PLs); however, the role of these openings in CETP structure and function has remained elusive to date. Our structural and functional analyses revealed that lipid traversal through CETP’s central tunnel is facilitated through hydrophobic interaction-mediated diffusion. Moreover, the tunnel’s function is dictated by its dynamic plasticity that allows lipid movement through a peristaltic wave-like motion. Further, the PLs are indispensable in establishing the optimal architecture of the CETP tunnel and accelerating lipid traversal through a novel “gliding” mechanism. Using free energy calculations and in vitro mutagenesis, an in-depth understanding of the mechanism of lipid exchange by CETP, guided and accentuated by its interaction with PLs, was obtained. To further understand the complete lipid transfer mechanism, we investigated both the global process and lipid-specific factors governing movement. Our free energy calculations demonstrated the non-specificity of the CETP termini for lipid entry, indicating that this process is primarily governed by the lipid’s surface availability, rather than any inherent protein bias for CE or TG. Furthermore, both lipid types follow a single, con- served physical path once inside the tunnel. Importantly, the CETP tunnel can accommodate two lipids simultaneously moving in opposite directions, providing compelling. The dynamics of lipid traversal are significantly influenced by lipid-specific factors such as acyl chain length and conformation. Long-chain TGs (LCTs) in specific conformations like ’Fork’ and ’T’ exhibit the longest residence times because they form a greater number of stable hydrophobic contacts with the tunnel residues. This finding is critical in the context of cardiovascular disease (CVD) where the slower transfer kinetics of LCTs—which are prevalent in the plasma of CVD patients— may lead to adverse, pro-atherogenic lipoprotein remodeling outcomes. Altogether, this study advances our understanding of CETP by revealing that its function relies on a precisely orchestrated interplay of tunnel plasticity and optimal hydrophobicity, which allows lipid entry and diffusion through the tunnel. The study also identified PL-plugs as essential co-factors that strongly influenced CETP function, with their ability to modulate both the tunnel hydrophobicity and lipid transfer dynamics. Lipid-specific factors, such as acyl chain length, which are directly influenced by diet, also influence lipid transfer dynamics. The study provides compelling evidence in favor of a ternary complex, wherein CETP bridges two lipoproteins, simultaneously facilitating the concurrent exchange of CE for TG. These insights open new avenues for designing and evaluating next-generation CETP-targeted drugs by focusing on the solvent-accessible PL-binding pockets or the integrity of the tunnel, providing a refined approach for modulating CETP function in patients with atherosclerosis. 2026-02-01T00:00:00Z Development of in silico tools for advancing non-invasive diagnostics and therapeutics http://repository.iiitd.edu.in/xmlui/handle/123456789/1782 Development of in silico tools for advancing non-invasive diagnostics and therapeutics Akanksha; Raghava, Gajendra Pal Singh (Advisor) Non-invasive diagnostics and therapeutics have the potential to revolutionize clinical practices by providing patient-friendly, accessible, and pain-free alternatives to traditional invasive methods. Among various biofluids, saliva stands out due to its easy collection and abundance of biomarkers that are reflective of systemic health. To accelerate the research on saliva-based non-invasive diagnostics and therapeutics, we built SalivaDB, a comprehensive database compiling 15,821 biomarker entries from various sources like research articles and databases. It contains information on about 201 diseases for biomarker categories like proteins, metabolites, microbes, miRNAs, and genes. SalivaDB is a user-friendly web-based platform, helping researchers by reducing research effort and time required for the discovery and validation of clinically relevant salivary biomarkers. One significant pathway through which these biomarkers enter saliva is via exosomes, which are secreted by approximately all cell types. Exosomes carry biomolecules like RNAs, proteins, and lipids, making them highly promising for early disease detection and targeted therapeutic delivery. To understand the role of exosomal biomarkers in diagnostics and therapeutics, we developed computational tools to predict major exosomal molecules. We first developed ExoProPred, which is a tool for predicting exosomal proteins. We initially applied a similarity- search-based method using BLAST, which was ineffective due to low sequence similarity among exosomal proteins. Subsequently, we applied a motif-based approach which revealed recurrent motifs that were unique to exosomal proteins. Although this method showed high accuracy, it had limited coverage. We then employed machine learning models using compositional and evolutionary features, achieving an AUROC of 0.73. To further enhance prediction performance, we developed a hybrid approach that integrated Machine Learning (ML) with motif analysis, resulting in a significantly improved AUROC of 0.85. After predicting exosomal proteins, we focused on miRNA which is another most commonly found biomarker in exosomes. We developed EmiRPred which is a computational tool to predict exosomal miRNAs. We built AI-based models (ML, DL, LLM) using a number of features including composition features, binary features, structural features, and structural images. We integrated these AI-based models with alignment-based approaches like (motif- search and similarity-search) to effectively predict exosomal miRNAs. This integrated ensemble model achieved an AUC of 0.73 on independent validation set. The tools ExoProPred and EmiRPred are accessible via web server, downloadable standalone, and python packages, supporting broad usability within the research community. In addition to predicting exosome-associated biomolecules, we also aimed to predict the molecules that are highly expressed in exosomes. It is observed that expression levels of miRNA varied significantly amongst different subcellular locations and also changed notably in the presence of disease. The identification of abundant miRNAs in exosomes is crucial for exploring their physiological roles and potential implications in disease diagnostics and therapeutics. Therefore, to provide deeper insights into baseline miRNA profiles in exosomes, we developed AdmirePred, a prediction tool to identify highly abundant miRNAs within blood exosomes. In this study, we used alignment-based methods like motif-search and similarity- search and alignment-free methods like machine learning algorithms. We leveraged a combination of both these approaches to develop a hybrid method that achieved an AUC of 0.854 on an independent validation set. AdmirePred is available as a standalone software, web server and a Python package. To demonstrate the practical utility of extracellular saliva-based biomarkers, we investigated extracellular RNA biomarkers in saliva for diagnosing Gastric Cancer (GC). Extracellular RNA (exRNA) originates from cells and is released into body fluids through active secretion via extracellular vesicles (exosomes) or passive release during cell death. We identified different sets of biomarkers including primary and secondary biomarkers in this study. The best performing set of features was an eight-gene biomarker panel with a high validation AUC of 0.905 and an MCC of 0.770. The biomarker panel identified in our study performed better than previously discussed biomarkers in the literature. These findings underscore saliva’s significant potential as a reliable and efficient diagnostic fluid for early and accurate GC detection. Overall, we present a wide-ranging collection of curated resources, computational tools, and methods that collectively aim to advance non-invasive diagnostics and therapeutics. These tools and findings are designed to support the transition from traditional invasive procedures to more accessible, efficient, and patient-friendly diagnostic approaches. 2025-09-01T00:00:00Z