Abstract:
Non-invasive diagnostics and therapeutics have the potential to revolutionize clinical practices by providing patient-friendly, accessible, and pain-free alternatives to traditional invasive methods. Among various biofluids, saliva stands out due to its easy collection and abundance of biomarkers that are reflective of systemic health. To accelerate the research on saliva-based non-invasive diagnostics and therapeutics, we built SalivaDB, a comprehensive database compiling 15,821 biomarker entries from various sources like research articles and databases. It contains information on about 201 diseases for biomarker categories like proteins, metabolites, microbes, miRNAs, and genes. SalivaDB is a user-friendly web-based platform, helping researchers by reducing research effort and time required for the discovery and validation of clinically relevant salivary biomarkers. One significant pathway through which these biomarkers enter saliva is via exosomes, which are secreted by approximately all cell types. Exosomes carry biomolecules like RNAs, proteins, and lipids, making them highly promising for early disease detection and targeted therapeutic delivery. To understand the role of exosomal biomarkers in diagnostics and therapeutics, we developed computational tools to predict major exosomal molecules. We first developed ExoProPred, which is a tool for predicting exosomal proteins. We initially applied a similarity- search-based method using BLAST, which was ineffective due to low sequence similarity among exosomal proteins. Subsequently, we applied a motif-based approach which revealed recurrent motifs that were unique to exosomal proteins. Although this method showed high accuracy, it had limited coverage. We then employed machine learning models using compositional and evolutionary features, achieving an AUROC of 0.73. To further enhance prediction performance, we developed a hybrid approach that integrated Machine Learning (ML) with motif analysis, resulting in a significantly improved AUROC of 0.85. After predicting exosomal proteins, we focused on miRNA which is another most commonly found biomarker in exosomes. We developed EmiRPred which is a computational tool to predict exosomal miRNAs. We built AI-based models (ML, DL, LLM) using a number of features including composition features, binary features, structural features, and structural images. We integrated these AI-based models with alignment-based approaches like (motif- search and similarity-search) to effectively predict exosomal miRNAs. This integrated ensemble model achieved an AUC of 0.73 on independent validation set. The tools ExoProPred and EmiRPred are accessible via web server, downloadable standalone, and python packages, supporting broad usability within the research community. In addition to predicting exosome-associated biomolecules, we also aimed to predict the molecules that are highly expressed in exosomes. It is observed that expression levels of miRNA varied significantly amongst different subcellular locations and also changed notably in the presence of disease. The identification of abundant miRNAs in exosomes is crucial for exploring their physiological roles and potential implications in disease diagnostics and therapeutics. Therefore, to provide deeper insights into baseline miRNA profiles in exosomes, we developed AdmirePred, a prediction tool to identify highly abundant miRNAs within blood exosomes. In this study, we used alignment-based methods like motif-search and similarity- search and alignment-free methods like machine learning algorithms. We leveraged a combination of both these approaches to develop a hybrid method that achieved an AUC of 0.854 on an independent validation set. AdmirePred is available as a standalone software, web server and a Python package. To demonstrate the practical utility of extracellular saliva-based biomarkers, we investigated extracellular RNA biomarkers in saliva for diagnosing Gastric Cancer (GC). Extracellular RNA (exRNA) originates from cells and is released into body fluids through active secretion via extracellular vesicles (exosomes) or passive release during cell death. We identified different sets of biomarkers including primary and secondary biomarkers in this study. The best performing set of features was an eight-gene biomarker panel with a high validation AUC of 0.905 and an MCC of 0.770. The biomarker panel identified in our study performed better than previously discussed biomarkers in the literature. These findings underscore saliva’s significant potential as a reliable and efficient diagnostic fluid for early and accurate GC detection. Overall, we present a wide-ranging collection of curated resources, computational tools, and methods that collectively aim to advance non-invasive diagnostics and therapeutics. These tools and findings are designed to support the transition from traditional invasive procedures to more accessible, efficient, and patient-friendly diagnostic approaches.