Abstract:
The inherent instability of peptides in biological fluids represents a significant challenge for their development as therapeutic agents. To advance the field, we have developed a database of peptide half-life called PEPlife2, which is an updated version of PEPlife. This extensive database comprises 4,486 entries, each comprehensively annotated with experimental conditions, sequence details, modifications, biological activity, and pharmacokinetic parameters. PEPlife2 empowers researchers with powerful data exploration tools, including interactive 3D structural visualization using NGL Viewer, sophisticated sequence-based search functionalities, and a convenient RESTful API. Furthermore, we conducted a thorough analysis of the database to discern key peptide characteristics and subsequently developed machine learning models for half-life prediction. Notably, a K-Nearest Neighbors model developed using dipeptide composition and modification features, yielded excellent predictive performance for modified peptides (validation R² = 0.83). While an XGBoost model based on amino acid composition was optimal for natural peptides, its accuracy was more modest (validation R² = 0.29). These results underscore promising avenues for enhancing predictive capabilities through the application of deep learning and large language models (LLMs). PEPlife2 is publicly available at https://webs.iiitd.edu.in/raghava/peplife2/.