A new machine learning approach for protein phosphorylation site prediction in plants

Jianjiong Gao, Ganesh Kumar Agrawal, Jay J. Thelen, Zoran Obradovic, A. Dunker, Dong Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylationsite data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of knearest neighbor and support vector machine for redicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages18-29
Number of pages12
Volume5462 LNBI
DOIs
StatePublished - 2009
Event1st International Conference on Bioinformatics and Computational Biology, BICoB 2009 - New Orleans, LA, United States
Duration: Apr 8 2009Apr 10 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5462 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other1st International Conference on Bioinformatics and Computational Biology, BICoB 2009
CountryUnited States
CityNew Orleans, LA
Period4/8/094/10/09

Fingerprint

Protein Phosphorylation
Phosphorylation
Learning systems
Machine Learning
Proteins
Prediction
Protein
Arabidopsis
Protein Kinase
Mass Spectrometry
Protein Sequence
Specificity
Support Vector Machine
Mass spectrometry
Support vector machines
Integrate
Substrate
Substrates

Keywords

  • Arabidopsis
  • KNN
  • Phosphoproteomics
  • Protein Disorder
  • Protein phosphorylation
  • SVM

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Gao, J., Agrawal, G. K., Thelen, J. J., Obradovic, Z., Dunker, A., & Xu, D. (2009). A new machine learning approach for protein phosphorylation site prediction in plants. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5462 LNBI, pp. 18-29). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5462 LNBI). https://doi.org/10.1007/978-3-642-00727-9_4

A new machine learning approach for protein phosphorylation site prediction in plants. / Gao, Jianjiong; Agrawal, Ganesh Kumar; Thelen, Jay J.; Obradovic, Zoran; Dunker, A.; Xu, Dong.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5462 LNBI 2009. p. 18-29 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5462 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, J, Agrawal, GK, Thelen, JJ, Obradovic, Z, Dunker, A & Xu, D 2009, A new machine learning approach for protein phosphorylation site prediction in plants. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5462 LNBI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5462 LNBI, pp. 18-29, 1st International Conference on Bioinformatics and Computational Biology, BICoB 2009, New Orleans, LA, United States, 4/8/09. https://doi.org/10.1007/978-3-642-00727-9_4
Gao J, Agrawal GK, Thelen JJ, Obradovic Z, Dunker A, Xu D. A new machine learning approach for protein phosphorylation site prediction in plants. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5462 LNBI. 2009. p. 18-29. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-00727-9_4
Gao, Jianjiong ; Agrawal, Ganesh Kumar ; Thelen, Jay J. ; Obradovic, Zoran ; Dunker, A. ; Xu, Dong. / A new machine learning approach for protein phosphorylation site prediction in plants. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5462 LNBI 2009. pp. 18-29 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b1952e313340407398ca0d37a9716ef8,
title = "A new machine learning approach for protein phosphorylation site prediction in plants",
abstract = "Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylationsite data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of knearest neighbor and support vector machine for redicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.",
keywords = "Arabidopsis, KNN, Phosphoproteomics, Protein Disorder, Protein phosphorylation, SVM",
author = "Jianjiong Gao and Agrawal, {Ganesh Kumar} and Thelen, {Jay J.} and Zoran Obradovic and A. Dunker and Dong Xu",
year = "2009",
doi = "10.1007/978-3-642-00727-9_4",
language = "English",
isbn = "3642007260",
volume = "5462 LNBI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "18--29",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A new machine learning approach for protein phosphorylation site prediction in plants

AU - Gao, Jianjiong

AU - Agrawal, Ganesh Kumar

AU - Thelen, Jay J.

AU - Obradovic, Zoran

AU - Dunker, A.

AU - Xu, Dong

PY - 2009

Y1 - 2009

N2 - Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylationsite data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of knearest neighbor and support vector machine for redicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.

AB - Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylationsite data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of knearest neighbor and support vector machine for redicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.

KW - Arabidopsis

KW - KNN

KW - Phosphoproteomics

KW - Protein Disorder

KW - Protein phosphorylation

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=68249084742&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68249084742&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00727-9_4

DO - 10.1007/978-3-642-00727-9_4

M3 - Conference contribution

SN - 3642007260

SN - 9783642007262

VL - 5462 LNBI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 18

EP - 29

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -