SPINE-D: Accurate prediction of short and long disordered regions by a single neural-network based method

Tuo Zhang, Eshel Faraggi, Bin Xue, A. Dunker, Vladimir N. Uversky, Yaoqi Zhou

Research output: Contribution to journalArticle

97 Citations (Scopus)

Abstract

Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.

Original languageEnglish
Pages (from-to)799-813
Number of pages15
JournalJournal of Biomolecular Structure and Dynamics
Volume29
Issue number4
StatePublished - Feb 2012

Fingerprint

Proteins
X-Rays
Informatics
Area Under Curve
Databases
Amino Acids
Population
Datasets

ASJC Scopus subject areas

  • Molecular Biology
  • Structural Biology

Cite this

SPINE-D : Accurate prediction of short and long disordered regions by a single neural-network based method. / Zhang, Tuo; Faraggi, Eshel; Xue, Bin; Dunker, A.; Uversky, Vladimir N.; Zhou, Yaoqi.

In: Journal of Biomolecular Structure and Dynamics, Vol. 29, No. 4, 02.2012, p. 799-813.

Research output: Contribution to journalArticle

Zhang, Tuo ; Faraggi, Eshel ; Xue, Bin ; Dunker, A. ; Uversky, Vladimir N. ; Zhou, Yaoqi. / SPINE-D : Accurate prediction of short and long disordered regions by a single neural-network based method. In: Journal of Biomolecular Structure and Dynamics. 2012 ; Vol. 29, No. 4. pp. 799-813.
@article{6178f92cca0e47caa8cb211e342661ae,
title = "SPINE-D: Accurate prediction of short and long disordered regions by a single neural-network based method",
abstract = "Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85{\%} overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81{\%} sensitivity in a balanced test dataset with 56.5{\%} ordered residues but more challenging (at 65{\%} sensitivity) in a test dataset with 90{\%} ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.",
author = "Tuo Zhang and Eshel Faraggi and Bin Xue and A. Dunker and Uversky, {Vladimir N.} and Yaoqi Zhou",
year = "2012",
month = "2",
language = "English",
volume = "29",
pages = "799--813",
journal = "Journal of Biomolecular Structure and Dynamics",
issn = "0739-1102",
publisher = "Adenine Press",
number = "4",

}

TY - JOUR

T1 - SPINE-D

T2 - Accurate prediction of short and long disordered regions by a single neural-network based method

AU - Zhang, Tuo

AU - Faraggi, Eshel

AU - Xue, Bin

AU - Dunker, A.

AU - Uversky, Vladimir N.

AU - Zhou, Yaoqi

PY - 2012/2

Y1 - 2012/2

N2 - Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.

AB - Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.

UR - http://www.scopus.com/inward/record.url?scp=84855184716&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84855184716&partnerID=8YFLogxK

M3 - Article

C2 - 22208280

AN - SCOPUS:84855184716

VL - 29

SP - 799

EP - 813

JO - Journal of Biomolecular Structure and Dynamics

JF - Journal of Biomolecular Structure and Dynamics

SN - 0739-1102

IS - 4

ER -