Abstract:
G-protein-coupled receptors (GPCRs) are vital pharmaceutical targets, with more than one-third of FDA-approved drugs influencing their function. While central to cellular signaling and drug development, GPCR research is hindered by various challenges. This thesis introduces several innovative tools and algorithms designed to deepen our understanding of GPCR biology. First, Reverse Cell Tracking (RCT), a novel computational framework that leverages RNA velocity embeddings to trace gene expression trajectories during cellular differentiation. Applying RCT to investigate odorant receptor (OR) gene expression during neuronal development, we uncovered insights into OR gene choice mechanisms. ORs, a subset of GPCRs traditionally associated with smell, are also expressed in non-olfactory tissues, including cancers, implicating them in processes such as migration, proliferation, and immune modulation. Their expression follows a unique "one neuron, one receptor" rule, driven by mutual exclusivity and monoallelic expression. However, recent single-cell studies have revealed co-expression of multiple ORs in immature neurons, suggesting alternative models such as winner-takes-all or stochastic selection. RCT analysis revealed a bias toward the most highly expressed OR during differentiation, offering potential breakthroughs in understanding OR expression patterns and could open up new avenues for diagnostics and therapeutic targeting outside the nose, especially in diseases like cancer, where altered GPCR signaling plays a critical role. Second, Machine-OlF-Action (MOA), a user-friendly, open-source computational framework designed to support GPCR researchers with minimal programming experience. As GPCR signaling gains prominence, there is a growing demand for accessible tools to efficiently explore and model GPCR-ligand interactions. While machine learning-based techniques are emerging as state-of-the-art approaches in chemoinformatics, enabling selective, effective, and rapid identification of biologically relevant molecules from vast chemical databases, their broader adoption in GPCR research has been limited due to their reliance on advanced computational skills, as well as the technical complexity of existing tools. MOA bridges this gap by allowing users to input SMILES strings and known activation statuses of compounds to build reliable classification models. By simplifying complex machine learning workflows into an accessible platform, MOA enables even researchers without a deep computational background to uncover meaningful GPCR-ligand relationships and advance the field of chemosensory biology. Third, Gcoupler, an AI-driven computational toolkit that combines de novo ligand design, advanced statistical approaches, Graph Neural Networks, and bioactivity-based prioritization to facilitate the unbiased identification of druggable surface cavities and the rational prediction of high-affinity ligands. While conventional GPCR-targeted therapies predominantly focus on orthosteric sites, emerging research highlights the therapeutic potential of allosteric sites. Despite the development of synthetic allosteric modulators, endogenous intracellular modulators remain largely unexplored due to a lack of comprehensive binding and phenotypic data. This data scarcity limits the applicability of traditional machine learning approaches. Gcoupler addresses this challenge by enabling cavity-specific predictions and ligand identification even in data-scarce GPCR contexts, paving the way for more targeted and effective drug discovery. This research introduces a suite of computational frameworks tailored to advance GPCR-targeted drug discovery by addressing key bottlenecks in modeling, data scarcity, and accessibility. These findings challenge the conventional view of OR expression and provide fresh insights into their functional roles beyond the olfactory system. By simplifying complex workflows and integrating AI-driven methods, these tools democratize computational biology for researchers with limited coding expertise. Collectively, they enhance the understanding of chemosensory GPCRs, enable unbiased ligand prioritization, and offer new strategies to tackle data-scarce targets, ultimately accelerating the development of selective and effective therapeutics.