Abstract:
Accurately predicting drug efficacy remains a significant challenge in the drug discovery process. One of the key pharmacological metrics used to assess the potency of a compound is the half-maximal inhibitory concentration (IC50). However, experimental determination of IC50 values is not only time-consuming and resource-intensive but also impractical for large-scale screening of extensive chemical libraries. In this context, computational models for IC50 prediction offer a promising alternative to accelerate early-stage drug discovery. Existing computational methods typically fall into two categories: (i) those that predict the pIC50 (−log10(IC50)) using sequence-based representations of proteins, and (ii) those that estimate binding affinity using 3D structural information of protein–ligand complexes. While sequence-based approaches benefit from broad data availability, they inherently lack the spatial and conformational context crucial for accurately modeling molecular interactions. Protein sequences are linear chains of amino acids and do not capture the tertiary structure or the functional binding pocket where drug-target interactions occur. Since the binding affinity and inhibitory activity are dictated by the physicochemical interactions within this three-dimensional pocket, structural data becomes indispensable for understanding drug–target binding mechanisms. In this study, we propose a deep learning framework utilizing Graph Neural Networks (GNNs) for the prediction of pIC50 values specifically for cancer-associated protein targets. Our approach leverages structural representations of proteins and molecular graph encodings derived from SMILES for ligands. The model independently encodes both the drug and the target as graph-based inputs and predicts interaction strength based on learned representations. To evaluate the performance of our model, we employed standard regression metrics, demonstrating competitive predictive accuracy against existing methods. Importantly, we restrict the scope of our model to a curated subset of cancer-related targets to maintain biological specificity and avoid the pitfalls of over-generalization across diverse protein classes. Our results highlight the critical role of integrating structural information in predictive modeling and emphasize the potential of GNN-based frameworks in precision oncology drug discovery.