
Research Article
The Identifications of Post Translational Modification Sites with Capsule Network
@INPROCEEDINGS{10.1007/978-3-030-97124-3_42, author={Baitong Chen and Yujian Gu and Bin Yang and Wenzheng Bao}, title={The Identifications of Post Translational Modification Sites with Capsule Network}, proceedings={Simulation Tools and Techniques. 13th EAI International Conference, SIMUtools 2021, Virtual Event, November 5-6, 2021, Proceedings}, proceedings_a={SIMUTOOLS}, year={2022}, month={3}, keywords={Post translational modification Malonylation One-hot encoding Principal component analysis Support vector machine}, doi={10.1007/978-3-030-97124-3_42} }
- Baitong Chen
Yujian Gu
Bin Yang
Wenzheng Bao
Year: 2022
The Identifications of Post Translational Modification Sites with Capsule Network
SIMUTOOLS
Springer
DOI: 10.1007/978-3-030-97124-3_42
Abstract
Post-translational modification (PTM) is considered a significant biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. Malonylation of lysine is a newly discovered post-translational modification, which is associated with many diseases, such as type 2 diabetes and different types of cancer. In addition, compared with the experimental identification of propionylation sites, the calculation method can save time and reduce cost. In this paper, we combine principal component analysis with support vector machine (SVM) to propose a new computational model - Mal-prec (malonylation prediction). Firstly, the one-hot encoding, physicochemical properties and the composition of k-spacer acid pairs were used to extract sequence features. Secondly, we preprocess the data, select the best feature subset by principal component analysis (PCA), and predict the malonylation sites by SVM. And then, we do a five-fold cross validation, and the results show that compared with other methods, Mal-prec can get better prediction performance. In the 10-fold cross validation of independent data sets, AUC (area under receiver operating characteristic curve) analysis has reached 96.39%. Mal-pred is used to identify the malonylation sites in the protein sequence, which is a computationally reliable method. It is superior to the existing prediction tools that found in the literature and can be used as a useful tool for identifying and discovering novel malonylation sites in human proteins.