Abstract:
Proteins play major roles in many biological processes such as enzymes, transporters, replication, transcription, gene expression regulation, repair and building of tissues, energy production, molecule transport, muscle development, wound healing, and muscle and bone restoration. Due to the advancement in sequencing technology, database of protein sequences are growing with exponential rate. Thus, functional annotation of a protein or prediction of function of a protein is one of the major challenges in post genomics era. In this study, a systematic attempt had been made to understand and predict function of a protein. First objective of this thesis is to predict regulatory proteins as they play an essential role in the replication, transcription, regulation of gene expression. In order predict function of a protein using machine learning techniques, a method Pfeature has been developed to predict protein features. Pfeature allows to compute a wide range of features/descriptors from the sequence and structure information of a protein. Further, these features have been used to develop machine learning based models for predicting transcription factors, important regulatory proteins. These proteins coordinate the biological functions by interacting with other molecules. Thus, it is crucial to identify interacting residues in a protein that interact with other biological molecules. Second objective of this study is to determine the protein-molecule interactions. Under this objective three prediction methods (NAGbinder, DBPred & Pprint2) have been developed for predicting interacting residues in a protein. NAGbinder developed for predicting N-acetyl glucosamine (NAG) binding residues. PPRINT2 for predicting RNA interacting residues in a protein. DBPred for predicting DNA binding sites in a proteins. In addition to functional annotation of proteins, an attempt has been made to identify cancer associated mutations in genome to understand the cancer pathogenesis. In order to achieve this objective, first we examining and compare mutation calling techniques. In this study four major mutation calling techniques (Mutect2, MuSE, Varscan2 & SomaticSniper) have been benchmarked and identify the best mutation calling technique. Finally, we developed a method for the identification of prognostic and diagnostic biomarkers using the mutation profile of liver cancer patients. All methods developed during this study are freely available to scientific community in form of web services and standalone software packages.