IIIT-Delhi Institutional Repository

Protein classification on the basis of thermal stability using supervised learning

Show simple item record

dc.contributor.author Sharma, Ankit
dc.contributor.author Bera, Debajyoti (Advisor)
dc.contributor.author Bagler, Ganesh (Advisor)
dc.date.accessioned 2018-09-20T07:10:32Z
dc.date.available 2018-09-20T07:10:32Z
dc.date.issued 2018-05
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/647
dc.description.abstract Species evolve by adapting to variable thermal conditions. The differences in the thermal stability of hyperthermophilic, thermophilic and mesophilic proteins arise partly due to their structural variations. The goal of this thesis is to identify structural features responsible for these variations using machines learning techniques that use features derived from the residue interaction graphs (RIG) and the amino acid sequences of proteins. For the RIG model, we studied the features linked to thermal stability which capture different notions of centrality, connection strength, weighted clustering coefficient and such. We evaluated them against a few features that were hitherto not studied in the context of thermal classification and demonstrated that the new features can significantly improve classification accuracy. We further improved the performance by using a histogram of centrality values as a feature vector instead of using a single statistic such as mean that has been the trend so far among researchers. We discovered that the histograms corresponding to edge betweenness centrality, current flow closeness centrality and 2-hop degree centrality lead to the best classification accuracy among the network-based features. We also investigated the state-of-the-art features based on amino acid sequences and proposed a new one using the amino acid tri-mers of a protein. For empirical evaluation, we investigated a set of 842 hyperthermophilic, 533 thermophilic and 2248 mesophilic proteins and compared our proposed features with the state-of-the-art features using commonly known classifiers such as SVM, ANN and random forest. We obtained an overall accuracy greater than 90% which is significantly better than what has been reported so far. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.title Protein classification on the basis of thermal stability using supervised learning en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository

Advanced Search


My Account