Abstract:
The accurate prediction of protein stability temperatures is essential for numerous applications in bioinformatics and biotechnology. In this study, we utilized a multifaceted computational approach to develop predictive models for protein stability temperatures and determine which modeling technique yields the best results. We began by utilizing the Pfeature package to compute different set of 16 features for a dataset comprising 31,470 protein sequences. These features encompassed various aspects, including amino acid composition, physicochemical properties, and structural characteristics. Subsequently, the dataset was standardized using StandardScaler to prepare it for analysis. Next, we employed an array of modeling techniques, including Artificial Neural Networks (ANN), Linear Regression, Decision Trees, and Random Forests, to establish predictive models. Each model was trained on the concatenated dataset of protein features and evaluated using standard regression metrics such as root mean square error (RMSE), mean absolute error (MAE), and R^2 score. Furthermore, we utilized MODELLER for homology modeling to generate three-dimensional structures for a subset of 15,000 proteins, selected based on sequence similarity. The Graphein package facilitated the analysis of protein structures by computing various types of bonds within the proteins. Additionally, using Amber Tools, we computed various energy components for each protein structure, including bond energy, angle energy, and solvation energy. These energy values were integrated into a dataset alongside the corresponding stability temperatures. Finally, we assess the accuracy of the different modeling techniques by evaluating their predictive accuracy using the aforementioned regression metrics. By systematically assessing the performance of each model, we endeavored to identify the most effective approach for predicting protein stability temperatures. This comprehensive computational study offers valuable insights into the prediction of protein stability temperatures, offering a sturdy foundation for future research in this domain.