| dc.description.abstract |
This thesis describes the compilation, characterization, and prediction of hemolytic peptides, which are responsible for lysing red blood cells. We present Hemolytik2, a comprehensive repository that significantly updates the 2014 Hemolytik database. This new version contains 13,215 entries (7,800 unique peptides), representing a threefold increase over its predecessor, compiled from scientific literature and other peptide databases. Each entry details information such as peptide sequence, terminal modifications, topology, stereochemistry, red blood cell (RBC) source, peptide origin, hemolytic potency, and structural features (SMILES, secondary/tertiary structures). In addition to data compilation, we characterized the peptides and developed a robust method for predicting hemolytic peptides. Peptide features were computed using the widely adopted Pfeature software. A wide range of machine learning techniques, including LightGBM and Random Forest, have been used to develop classification models for discriminating hemolytic and non-hemolytic peptides. SHapley Additive exPlanations (SHAP)-based feature analysis was then applied to identify and rank important features to understand potential of physicochemical descriptors and amino acids. The insights gained from this prediction and feature analysis will be invaluable for the rational design of optimal, safe hemolytic peptides. |
en_US |