Abstract:
In order to comprehend the functionality and stability of proteins and peptides, it is essential to forecast their folding rates. This thesis explores the construction of sophisticated machine learning (ML) and deep learning (DL) models for this purpose. The study employs a comprehensive computational methodology that integrates a wide range of bioinformatics instruments to effectively navigate the intricacies of protein folding dynamics. Using Pfeature, a programme created to extract a wide variety of features from protein sequences and greatly improve the input data quality for machine learning models, is the fundamental step in the feature engineering process. Additionally, to represent protein structures as networks and enable a more in-depth examination of the connections between residues that influence folding kinetics, the study makes use of Graph Signal Processing (GSP) techniques. Amber23 facilitates molecular dynamics (MD) simulations, which are essential to the study since they model the atomic movements within proteins under varied settings and offer dynamic insights into protein behaviour. Understanding the energetic and structural alterations that take place during the folding process is made possible by this method, which also enriches the dataset with crucial parameters for precise model training. The thesis uses a range of machine learning models, including advanced regressors, to interpret the intricate datasets that are produced. These models are able to capture the nuanced parameters that control protein folding rates because they are trained on features generated from both sequence data and MD simulation results. The incorporation of many data sources and analytical methods guarantees that the models created not only accurately forecast folding rates but also help to understand theory of protein Biophysics This work considerably increases the predictive capacities in protein research by fusing data science, machine learning, and computational biology. It provides fresh insights into one of the most intricate biological processes and may find use in genetic and medication design research.