Machine learning and deep learning models for prediction of protein-ligand binding affinity

Kaur, Parneet; Murugan, N Arul (Advisor)

Home
→
Computational Biology
→
MTech Theses
→
Year-2024
→
View Item

dc.contributor.author	Kaur, Parneet
dc.contributor.author	Murugan, N Arul (Advisor)
dc.date.accessioned	2024-09-17T13:37:50Z
dc.date.available	2024-09-17T13:37:50Z
dc.date.issued	2024-05-01
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1665
dc.description.abstract	In recent years, there has been significant interest in using Machine Learning and Deep Learning to predict protein-ligand binding affinity. This is due to the rapid growth of the computational approaches that have evolved in drug discovery. The binding affinity prediction is useful in the virtual screening and drug screening optimization step of drug discovery.. The ML and DL-based approaches have shown notable improvements compared to the conventional approaches. The conventional approaches are time-consuming, complex, and challenging. However, the introduction of computational approaches has expedited the drug discovery timeline. In this study, we aim to develop Machine Learning models and benchmark some of the Deep Learning models to predict the protein-ligand binding affinity. We have used the refined set of the PDBbind database(version 2020) to fetch the protein-ligand structural data and binding affinity data. We have used the dataset mentioned above for the machine learning models and featurized the protein-ligand complexes using tools such as RDkit/Mordred and Pfeature, followed by feature selection. Models such as SVM, Random Forest, Multiple Linear Regression, etc, have been used to predict the binding affinity of PL complexes. From all the ML models we tested, it was observed that Random Forest performed better with an R-squared value of 0.6. Further, we benchmarked the CNN-based Deep learning models such as Pafnucy and OnionNet-2 using the refined set of PDBbind as the benchmarking test dataset. It was observed that the OnionNet-2 model showed better predictive performance at an R-squared value of 0.85 than that of the Pafnucy model at an R-squared value of 0.46. We have discussed this relative performance in our study. Hence, it was observed that out of all the approaches we used, the PDBbind refined dataset showed the maximum R-squared value when it was benchmarked using the OnionNet-2 model. We have also discussed the reasons for the variation and the future scope of the study.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Binding affinity	en_US
dc.subject	protein-ligand complex	en_US
dc.subject	PDBbind	en_US
dc.subject	machine learning	en_US
dc.subject	deep learning	en_US
dc.title	Machine learning and deep learning models for prediction of protein-ligand binding affinity	en_US
dc.type	Thesis	en_US