Sequence complexity of disordered protein

Pedro Romero, Zoran Obradovic, Xiaohong Li, Ethan C. Garner, Celeste J. Brown, A. Dunker

Research output: Contribution to journalArticle

1067 Citations (Scopus)

Abstract

Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.

Original languageEnglish (US)
Pages (from-to)38-48
Number of pages11
JournalProteins: Structure, Function and Genetics
Volume42
Issue number1
DOIs
StatePublished - 2001
Externally publishedYes

Fingerprint

Proteins
Amino Acids
Intrinsically Disordered Proteins
Protein Databases
Entropy
Circular Dichroism
Amino Acid Sequence
Magnetic Resonance Spectroscopy
Databases
Crystal structure
Nuclear magnetic resonance
Chemical analysis

Keywords

  • Neural network predictors
  • Protein disorder
  • Sequence complexity

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this

Sequence complexity of disordered protein. / Romero, Pedro; Obradovic, Zoran; Li, Xiaohong; Garner, Ethan C.; Brown, Celeste J.; Dunker, A.

In: Proteins: Structure, Function and Genetics, Vol. 42, No. 1, 2001, p. 38-48.

Research output: Contribution to journalArticle

Romero, Pedro ; Obradovic, Zoran ; Li, Xiaohong ; Garner, Ethan C. ; Brown, Celeste J. ; Dunker, A. / Sequence complexity of disordered protein. In: Proteins: Structure, Function and Genetics. 2001 ; Vol. 42, No. 1. pp. 38-48.
@article{7441ec7ffb684902aecb5b6059cddc0e,
title = "Sequence complexity of disordered protein",
abstract = "Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.",
keywords = "Neural network predictors, Protein disorder, Sequence complexity",
author = "Pedro Romero and Zoran Obradovic and Xiaohong Li and Garner, {Ethan C.} and Brown, {Celeste J.} and A. Dunker",
year = "2001",
doi = "10.1002/1097-0134(20010101)42:1<38::AID-PROT50>3.0.CO;2-3",
language = "English (US)",
volume = "42",
pages = "38--48",
journal = "Proteins: Structure, Function and Genetics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "1",

}

TY - JOUR

T1 - Sequence complexity of disordered protein

AU - Romero, Pedro

AU - Obradovic, Zoran

AU - Li, Xiaohong

AU - Garner, Ethan C.

AU - Brown, Celeste J.

AU - Dunker, A.

PY - 2001

Y1 - 2001

N2 - Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.

AB - Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.

KW - Neural network predictors

KW - Protein disorder

KW - Sequence complexity

UR - http://www.scopus.com/inward/record.url?scp=0035188314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035188314&partnerID=8YFLogxK

U2 - 10.1002/1097-0134(20010101)42:1<38::AID-PROT50>3.0.CO;2-3

DO - 10.1002/1097-0134(20010101)42:1<38::AID-PROT50>3.0.CO;2-3

M3 - Article

C2 - 11093259

AN - SCOPUS:0035188314

VL - 42

SP - 38

EP - 48

JO - Proteins: Structure, Function and Genetics

JF - Proteins: Structure, Function and Genetics

SN - 0887-3585

IS - 1

ER -