Mining α-helix-forming molecular recognition features with cross species sequence alignments

Yugong Cheng, Christopher J. Oldfield, Jingwei Meng, Pedro Romero, Vladimir N. Uversky, A. Dunker

Research output: Contribution to journalArticle

199 Citations (Scopus)

Abstract

Previously described algorithms for mining α-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry 44, 1989-2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043-1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein-protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional α-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, α-MoRF-PredII, were 0.87 ± 0.10, 0.87 ± 0.11, and 0.87 ± 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the α-MoRF-PredII prediction accuracy.

Original languageEnglish
Pages (from-to)13468-13477
Number of pages10
JournalBiochemistry
Volume46
Issue number47
DOIs
StatePublished - Nov 27 2007

Fingerprint

Molecular recognition
Sequence Alignment
Proteins
Amino Acids
Databases
Biochemistry
Feedforward neural networks
Sensitivity and Specificity
Monomers
Neural networks

ASJC Scopus subject areas

  • Biochemistry

Cite this

Mining α-helix-forming molecular recognition features with cross species sequence alignments. / Cheng, Yugong; Oldfield, Christopher J.; Meng, Jingwei; Romero, Pedro; Uversky, Vladimir N.; Dunker, A.

In: Biochemistry, Vol. 46, No. 47, 27.11.2007, p. 13468-13477.

Research output: Contribution to journalArticle

Cheng, Y, Oldfield, CJ, Meng, J, Romero, P, Uversky, VN & Dunker, A 2007, 'Mining α-helix-forming molecular recognition features with cross species sequence alignments', Biochemistry, vol. 46, no. 47, pp. 13468-13477. https://doi.org/10.1021/bi7012273
Cheng, Yugong ; Oldfield, Christopher J. ; Meng, Jingwei ; Romero, Pedro ; Uversky, Vladimir N. ; Dunker, A. / Mining α-helix-forming molecular recognition features with cross species sequence alignments. In: Biochemistry. 2007 ; Vol. 46, No. 47. pp. 13468-13477.
@article{fc42e7c5df8d45cb821519414481c4d6,
title = "Mining α-helix-forming molecular recognition features with cross species sequence alignments",
abstract = "Previously described algorithms for mining α-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry 44, 1989-2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043-1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein-protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional α-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, α-MoRF-PredII, were 0.87 ± 0.10, 0.87 ± 0.11, and 0.87 ± 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the α-MoRF-PredII prediction accuracy.",
author = "Yugong Cheng and Oldfield, {Christopher J.} and Jingwei Meng and Pedro Romero and Uversky, {Vladimir N.} and A. Dunker",
year = "2007",
month = "11",
day = "27",
doi = "10.1021/bi7012273",
language = "English",
volume = "46",
pages = "13468--13477",
journal = "Biochemistry",
issn = "0006-2960",
publisher = "American Chemical Society",
number = "47",

}

TY - JOUR

T1 - Mining α-helix-forming molecular recognition features with cross species sequence alignments

AU - Cheng, Yugong

AU - Oldfield, Christopher J.

AU - Meng, Jingwei

AU - Romero, Pedro

AU - Uversky, Vladimir N.

AU - Dunker, A.

PY - 2007/11/27

Y1 - 2007/11/27

N2 - Previously described algorithms for mining α-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry 44, 1989-2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043-1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein-protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional α-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, α-MoRF-PredII, were 0.87 ± 0.10, 0.87 ± 0.11, and 0.87 ± 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the α-MoRF-PredII prediction accuracy.

AB - Previously described algorithms for mining α-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry 44, 1989-2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043-1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein-protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional α-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, α-MoRF-PredII, were 0.87 ± 0.10, 0.87 ± 0.11, and 0.87 ± 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the α-MoRF-PredII prediction accuracy.

UR - http://www.scopus.com/inward/record.url?scp=36749037699&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36749037699&partnerID=8YFLogxK

U2 - 10.1021/bi7012273

DO - 10.1021/bi7012273

M3 - Article

C2 - 17973494

AN - SCOPUS:36749037699

VL - 46

SP - 13468

EP - 13477

JO - Biochemistry

JF - Biochemistry

SN - 0006-2960

IS - 47

ER -