Optimizing long intrinsic disorder predictors with protein evolutionary information

Kang Peng, Slobodan Vucetic, Predrag Radivojac, Celeste J. Brown, A. Dunker, Zoran Obradovic

Research output: Contribution to journalArticle

257 Citations (Scopus)

Abstract

Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 ± 1.4%, 85.3 ± 1.4%, and 85.2 ± 1.5%. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 ± 1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6 ± 1.3%) and VL2 (80.9 ± 1.4%). The new disorder predictors with the corresponding datasets are freely accessible through the web server at www.ist.temple.edu/disprot.

Original languageEnglish
Pages (from-to)35-60
Number of pages26
JournalJournal of Bioinformatics and Computational Biology
Volume3
Issue number1
DOIs
StatePublished - Feb 2005

Fingerprint

Proteins
Amino acids
Servers
Neural networks
Amino Acids
Datasets

Keywords

  • Evolutionary information
  • Intrinsic protein disorder
  • Neural networks
  • PONDR
  • Prediction
  • PSI-BLAST

ASJC Scopus subject areas

  • Medicine(all)
  • Cell Biology

Cite this

Optimizing long intrinsic disorder predictors with protein evolutionary information. / Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J.; Dunker, A.; Obradovic, Zoran.

In: Journal of Bioinformatics and Computational Biology, Vol. 3, No. 1, 02.2005, p. 35-60.

Research output: Contribution to journalArticle

Peng, Kang ; Vucetic, Slobodan ; Radivojac, Predrag ; Brown, Celeste J. ; Dunker, A. ; Obradovic, Zoran. / Optimizing long intrinsic disorder predictors with protein evolutionary information. In: Journal of Bioinformatics and Computational Biology. 2005 ; Vol. 3, No. 1. pp. 35-60.
@article{1c2e3e47671647e8aaf3a487ef20ad73,
title = "Optimizing long intrinsic disorder predictors with protein evolutionary information",
abstract = "Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 ± 1.4{\%}, 85.3 ± 1.4{\%}, and 85.2 ± 1.5{\%}. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 ± 1.4{\%}. This is a significant improvement over our previous PONDRs VLXT (71.6 ± 1.3{\%}) and VL2 (80.9 ± 1.4{\%}). The new disorder predictors with the corresponding datasets are freely accessible through the web server at www.ist.temple.edu/disprot.",
keywords = "Evolutionary information, Intrinsic protein disorder, Neural networks, PONDR, Prediction, PSI-BLAST",
author = "Kang Peng and Slobodan Vucetic and Predrag Radivojac and Brown, {Celeste J.} and A. Dunker and Zoran Obradovic",
year = "2005",
month = "2",
doi = "10.1142/S0219720005000886",
language = "English",
volume = "3",
pages = "35--60",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "1",

}

TY - JOUR

T1 - Optimizing long intrinsic disorder predictors with protein evolutionary information

AU - Peng, Kang

AU - Vucetic, Slobodan

AU - Radivojac, Predrag

AU - Brown, Celeste J.

AU - Dunker, A.

AU - Obradovic, Zoran

PY - 2005/2

Y1 - 2005/2

N2 - Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 ± 1.4%, 85.3 ± 1.4%, and 85.2 ± 1.5%. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 ± 1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6 ± 1.3%) and VL2 (80.9 ± 1.4%). The new disorder predictors with the corresponding datasets are freely accessible through the web server at www.ist.temple.edu/disprot.

AB - Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be responsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (>30 amino acid residues) regions lacking specific 3-D structure by means of four new neural-network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 vs. 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6 ± 1.4%, 85.3 ± 1.4%, and 85.2 ± 1.5%. By combining VL3H and VL3P, the resulting PONDR VL3E achieved an accuracy of 86.7 ± 1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6 ± 1.3%) and VL2 (80.9 ± 1.4%). The new disorder predictors with the corresponding datasets are freely accessible through the web server at www.ist.temple.edu/disprot.

KW - Evolutionary information

KW - Intrinsic protein disorder

KW - Neural networks

KW - PONDR

KW - Prediction

KW - PSI-BLAST

UR - http://www.scopus.com/inward/record.url?scp=14544299774&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14544299774&partnerID=8YFLogxK

U2 - 10.1142/S0219720005000886

DO - 10.1142/S0219720005000886

M3 - Article

VL - 3

SP - 35

EP - 60

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 1

ER -