Musite, a tool for global prediction of general and kinase-specific phosphorylation sites

Jianjiong Gao, Jay J. Thelen, A. Keith Dunker, Dong Xu

Research output: Contribution to journalArticle

166 Scopus citations

Abstract

Reversible protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. High throughput experimental studies using mass spectrometry have identified many phosphorylation sites, primarily from eukaryotes. However, the vast majority of phosphorylation sites remain undiscovered, even in well studied systems. Because mass spectrometry-based experimental approaches for identifying phosphorylation events are costly, time-consuming, and biased toward abundant proteins and proteotypic peptides, in silico prediction of phosphorylation sites is potentially a useful alternative strategy for whole proteome annotation. Because of various limitations, current phosphorylation site prediction tools were not well designed for comprehensive assessment of proteomes. Here, we present a novel software tool, Musite, specifically designed for large scale predictions of both general and kinase-specific phosphorylation sites. We collected phosphoproteomics data in multiple organisms from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates local sequence similarities to known phosphorylation sites, protein disorder scores, and amino acid frequencies. Application of Musite on several proteomes yielded tens of thousands of phosphorylation site predictions at a high stringency level. Cross-validation tests show that Musite achieves some improvement over existing tools in predicting general phosphorylation sites, and it is at least comparable with those for predicting kinase-specific phosphorylation sites. In Musite V1.0, we have trained general prediction models for six organisms and kinase-specific prediction models for 13 kinases or kinase families. Although the current pretrained models were not correlated with any particular cellular conditions, Musite provides a unique functionality for training customized prediction models (including condition-specific models) from users' own data. In addition, with its easily extensible open source application programming interface, Musite is aimed at being an open platform for community-based development of machine learning-based phosphorylation site prediction applications. Musite is available at http://musite.sourceforge.net/.

Original languageEnglish (US)
Pages (from-to)2586-2600
Number of pages15
JournalMolecular and Cellular Proteomics
Volume9
Issue number12
DOIs
StatePublished - Dec 2010

ASJC Scopus subject areas

  • Analytical Chemistry
  • Biochemistry
  • Molecular Biology

Fingerprint Dive into the research topics of 'Musite, a tool for global prediction of general and kinase-specific phosphorylation sites'. Together they form a unique fingerprint.

  • Cite this