| dc.description.abstract |
The correct prediction of conformational B-cell epitopes is absolutely crucial to the rational design of subunit vaccines and antibody-based therapies. Contrary to the nature of linear epitopes, conformational epitopes are comprised of non-contiguous residues brought into close spatial proximity through the three-dimensional folding of the antigen. The process of identifying them has become a computationally challenging task. Initial models typically used only sequence-derived features and performed poorly. Researchers have begun to incorporate 3D structural information into prediction workflows to break these limitations. Several notable tools have been developed to exploit structural cues. DiscoTope (2006) employed spatial proximity and solvent accessibility; ElliPro (2008) utilized ellipsoid-based modeling and residue protrusion indices; SEPPA (2009, 2011) utilized neighborhood-preserving geometry; EpiPred (2013) combined docking and energetic modeling; and BEpro (previously PEPITO) utilized surface clustering and propensity scoring. The tools usually employ structural features like solvent accessibility, secondary structure, flexibility, and topological features, which are typically derived through DSSP, and utilize statistical or machine learning methods. In this work, we present a structure-based machine learning model for prediction of conformational B-cell epitopes from a benchmark dataset developed by Cia et al. (2023), consisting of high-resolution antibody–antigen complexes. We extracted a broad spectrum of structural features derived from DSSP, ranging from secondary structure, absolute and relative solvent accessibility (ACC, RSA) ,backbone torsion angles (phi and psi), to hydrogen bonding metrics. The residues were encoded with a sliding window to maintain local structural context. We compared several machine learning models, including Random Forest, Logistic Regression, LightGBM, and Gradient Boosting. The gradient boosting classifier yielded the best results, with an AUROC of 0.76 and MCC of 0.43 on the validation set. Analysis of feature importance unveiled torsion angles, solvent accessibility, and secondary structure as high contributors. This work stresses a successful application of structure-driven features for epitope prediction and offers a robust, transferable pipeline for subsequent immunoinformatics applications. |
en_US |