| dc.description.abstract |
Determining whether a drug molecule inhibits a target protein is a critical step in the drug discovery process. While the pIC50 value is commonly used to quantify the inhibitory effect of a drug, experimentally determining these values is often expensive, slow, and not feasible for large-scale screening. To address this, we propose a deep learning-based approach to classify protein-ligand interactions as inhibitory or non-inhibitory, using 1D sequence data. Our method uses protein amino acid sequences and ligand representations in the form of SMILES strings as inputs. The corresponding interaction label is derived from experimentally known pIC50 values, binarized into inhibitory and non-inhibitory classes based on a defined threshold. We utilize pretrained transformer models from the Hugging Face library to encode both protein and ligand sequences into contextual embeddings, which are then combined and passed through a neural classification head. This structure-free, sequence-only approach eliminates the need for 3D structural data, making it computationally efficient and scalable for high-throughput applications. The model is trained and evaluated on datasets involving cancer-related targets, and it demon- strates promising performance across standard binary classification metrics. Our results validate the use of transformer-based sequence models for predicting drug–target interaction classes, en- abling faster virtual screening pipelines in early-stage drug discovery. |
en_US |