Abstract:
Neurodegenerative diseases are one of the major problems in today's world, and Alzheimer's is one of them. The disease-associated symptoms are not diagnosed earlier, and also very few FDA-approved disease-reversing drugs are available on the market for the disease. The Machine learning approach in the field of drug discovery and design is increasing rapidly. The objective of this study is creating a Machine Learning models to predict the pIC50 value of chemical compounds and find out the multi-targeting drugs against the promising targets of Alzheimer's disease, i.e., AChE, CDK5, GS, and GSK-3. To build a machine learning model, the dataset we use contains the SMILES of chemical compounds and their corresponding IC50 values (for aforementioned targets) and was downloaded from ChEMBL. We generate the features or descriptors using RDkit. Regression models like multiple linear regression, linear regression, SVR, polynomial regression, and RF on 2D and all descriptors (1D, 2D, and 3D) have been used to predict the pIC50 value. By testing all the models, we see the best performance is given by the RF with an R- squared value of 0.61 for targets GS and GSK3 and SVR with an R- squared value of 0.65 and 0.72 for targets AChE and CDK5 respectively for 2D descriptors. For all descriptors, the SVR shows the best performance with an R-squared value of 0.68, 0.69, 0.62 and 0.67 for targets AChE, CDK5, GS and GSK3 respectively. Further prediction is done on drug databases like DrugBank. The prediction shows that Arundic acid, Colocoxib, Chlorphenoxamine, Clomocycline and Raloxifene from DrugBank may be the best candidates for the selected targets, and the structure-based validation is done by using molecular docking and the computed binding affinity shows that indeed these compounds can potentially inhibit these targets.