Using several pseudo amino acid composition types and different machine learning algorithms to classify and predict archaeal phospholipases

Document Type : Original article

Authors

Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran

Abstract

Phospholipases, as important lipolytic enzymes, have diverse industrial applications. Regarding the stability of extremophilic archaea’s proteins in harsh conditions, analyses of unusual features of their proteins are significantly important for their utilization. This research was accomplished to in silico study of archaeal phospholipases’ properties and to develop a pioneering method for distinguishing these enzymes from other archaeal enzymes via machine learning algorithms and Chou’s pseudo-amino acid composition concept. The non-redundant sequences of archaeal phospholipases were collected. BioSeq-Analysis sever was used with Support Vector Machine (SVM), Random Forests (RF), Covariance Discrimination (CD), and Optimized Evidence-Theoretic K-nearest Neighbor (OET-KNN) as powerful machine learnings algorithms. Also, different Chou’s pseudo-amino acid composition modes were performed and then, 5-fold cross-validation was applied to the sequences. Based on our results, the OET-KNN predictor, with 96% accuracy, yields the best performance in SC-PseAAC mode by 5-fold cross-validation. This predictor also achieved very high values of specificity (95%), sensitivity (96%), Matthews’s correlation coefficient (0.92), and accuracy (96%). The present investigation yielded a robust anticipatory model for the archaeal phospholipase prediction utilizing the tenets PseAAC and OET-KNN machine learning algorithm.

Keywords


  1. Gribaldo S, Brochier-Armanet C. The origin and evolution of Archaea: a state of the art. Philos Trans R Soc Lond B Biol Sci 2006;361:1007-1022.
  2. Hinkel LA, Wargo MJ. Participation of bacterial lipases, sphingomyelinases, and phospholipases in gram-negative bacterial pathogenesis. In book: Health Consequences of Microbial Interactions with Hydrocarbons, Oils, and Lipids 2020;9:181-203.
  3. Lindås AC, Bernander R. The cell cycle of archaea. Nat Rev Microbiol 2013;11:627-638.
  4. Moissl-Eichinger C, Pausan M, Taffner J, Berg G, Bang C, Schmitz RA. Archaea are interactive components of complex microbiomes. Trends Microbiol 2018;26:70-85.
  5. De Maria L, Vind J, Oxenbøll K, Svendsen A, Patkar S. Phospholipases and their industrial applications. Appl Microbiol Biotechnol 2007;74: 290-300.
  6. Aloulou A, Ali YB, Bezzine S, Gargouri Y, Gelb MH. Phospholipases: an overview. Lipases and phospholipases: Springer, 2012.
  7. Borrelli GM, Trono D. Recombinant lipases and phospholipases and their use as biocatalysts for industrial applications. Int J Mol Sci 2015;16:20774-20840.
  8. Wang B, Lu D, Gao R, Yang Z, Cao S, Feng Y. A novel phospholipase A2/esterase from hyperthermophilic archaeon Aeropyrum pernix K1. Protein Expr Purif 2004;35:199-205.
  9. Feng Y, Joh YG, Ishikawa K, Ishida H, Ando S, Yamagaki T, Nakanishi H, Cao S, Matsui I, Kosugi Y. Thermophilic phospholipase A2 in the cytosolic fraction from the archaeon Pyrococcus horikoshii. J Am Oil Chem' Soc 2000;77:1147-1152.
  10. Foroozandeh Shahraki M, Farhadyar K, Kavousi K, Azarabad MH, Boroomand A, Ariaeenejad S, Hosseini Salekdeh G. A generalized machine-learning aided method for targeted identification of industrial enzymes from metagenome: A xylanase temperature dependence case study. Biotechnol Bioeng 2021;118:759-769.
  11. Nallapareddy MV, Dwivedula R. ABLE: Attention based learning for enzyme classification. Comput Biol Chem 2021;94:107558.
  12. Beigi MM, Behjati M, Mohabatkar H. Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach. J Struct Funct Genomics 2011;12:191-197.
  13. Noble WS. A biologist’s introduction to support vector machines. Noble Ressearch Lab 2006; 1-22.
  14. Yadav SK, Tiwari AK. Classification of enzymes using machine learning based approaches: a review. Mach Learn App 2015;2:30-49.
  15. Boulesteix AL, Janitza S, Kruppa J, König IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov 2012;2:493-507.
  16. Lee S, Lee Bc, Kim D. Prediction of protein secondary structure content using amino acid composition and evolutionary information. Proteins 2006;62:1107-1114.
  17. Coeytaux K, Poupon A. Prediction of unfolded segments in a protein sequence based on amino acid composition. Bioinformatics 2005;21:1891-1900.
  18. Xia JF, Han K, Huang DS. Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. Protein Pept Lett 2010;17:137-145.
  19. Liu YC, Yang M-H, Lin WL, Huang CK, Oyang YJ. A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins. BMC Genomics 10 (Suppl 3): S22.
  20. Shen HB, Chou KC. PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 2008;373:386-388.
  21. Chou KC, Shen HB. Recent progress in protein subcellular location prediction. Anal Biochem 2007;370:1-16.
  22. Shen HB, Chou KC. Using ensemble classifier to identify membrane protein types. Amino Acids 2007;32:483-488.
  23. Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 2019;20:1280-1294.
  24. Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005;21:10-19.
  25. Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 2015;43:W65-W71.
  26. Mousavizadegan M, Mohabatkar H. Computational prediction of antifungal peptides via Chou’s PseAAC and SVM. J Bioinform Comput Biol 2018;16:1850016.
  27. Liu B, Wu H, Chou KC. Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 2017;9:67-91.
  28. Chou KC, Elrod DW. Prediction of membrane protein types and subcellular locations. Proteins 1999;34:137-153.
  29. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006;27:861-874.
  30. Richmond GS, Smith TK. Phospholipases A1. Int J Mol Sci 2011;12:588-612.
  31. Meghwanshi GK, Verma S, Srivastava V, Kumar R. Archaeal lipolytic enzymes: Current developments and further prospects. Biotechnology Adv 2022;61:108054.
  32. Sahu SS, Panda G. A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 2010;34:320-327.
  33. Yu L, Guo Y, Li Y, Li G, Li M, Luo J, Xiong W, Qin W. SecretP: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition. J Theor Biol 2010;267:1-6.
  34. Mohabatkar H. Prediction of cyclin proteins using Chou's pseudo amino acid composition. Protein Pept Lett 2010;17:1207-1214.
  35. Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 2010; 263:203-209.
  36. Chou KC, Cai YD. Predicting enzyme family class in a hybridization space. Protein Sci 2004;13:2857-2863.
  37. Cai YD, Zhou GP, Chou KC. Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. J Theor Biol 2005;234:145-149.
  38. Gu Q, Ding YS, Zhang TL. Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. Protein Pept Lett 2010;17:559-567.
  39. Qiu JD, Huang JH, Liang RP, Lu XQ. Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform. Anal Biochem 2009;390:68-73.
  40. Xiao X, Wang P, Chou KC. GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes. J Comput Chem 2009;30:1414-1423.
  41. Ding H, Luo L, Lin H. Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. Protein Pept Lett 2009;16:351-355.
  42. Lin H, Wang H, Ding H, Chen YL, Li QZ. Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor 2009;57: 321-330.
  43. Jian X, Wei R, Zhan T, Gu Q. Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Pept Lett 2008;15:392-396.
  44. Zhang GY, Li HC, Gao JQ, Fang BS. Predicting lipase types by improved Chou's pseudo-amino acid composition. Protein Pept Lett 2008;15:1132-1137.
  45. Lin H, Ding H, Guo FB, Zhang AY, Huang J. Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein Pept Lett 2008; 5:739-744.
  46. Zhang GY, Fang BS. Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. J Theor Biol 2008;253:310-315.
  47. Fang Y, Guo Y, Feng Y, Li M. Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 2008; 34:103-109.
  48. Xiao X, Wang P, Chou KC. Quat-2L: a web-server for predicting protein quaternary structural attributes. Mol Divers 2011;15:149-155.
  49. Shen HB, Chou KC. Identification of proteases and their types. Anal Biochem 2009;385: 153-160.
  50. Mohabatkar H, Beigi MM, Esmaeili A. Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. J Theor Biol 2011;281:18-23.
  51. Mohabatkar H, Ebrahimi S, Moradi M. Using Chou’s five-steps rule to classify and predict glutathione S-transferases with different machine learning algorithms and pseudo amino acid composition. Int J Pept Res Therap 2021;27:309-316.
  52. Song C, Yang B. Use Chou’s 5-step rule to classify protein modification sites with neural network. Sci Program 2020;2020:8894633.
  53. Mohabatkar H, Rabiei P, Alamdaran M. New achievements in bioinformatics prediction of post translational modification of proteins. Curr Top Med Chem 2017;17:2381-2392.