A major objective of bioinformatics is to predict protein structure/function from sequence; the most successful current methods use sequence homology. These current homology-based methods explicitly or implicitly assume the paradigm sequence yields structure yields function. Since protein function depends on flexibility, movement, or even lack of structure (disorder) in some cases, combining various motion predictions with structure predictions should improve prediction of function. Algorithms for predicting flexibility from sequence and local disorder have been developed, as have algorithms to predict sequences that snitch between two structured states. Sequence complexity might also be an indirect measure of mobility and still other measures could be discovered by the study of known examples. We propose to determine the interplay of homology-based predictions that use sequence information only with explicit representations of both structure information and motion information as determined from amino acid sequence. Structural information will be represented by helix, sheet, etc., predictions, and by hydrophobic moment calculations. Motion information will be represented by flexibility prediction, switch sequence prediction, order / disorder prediction, sequence complexity, and new measures if any are discovered in the course of this work. Since different functions would involve motional parameters to different extents, the plan is to apply this combined approach on a sequence family basis. Novel comparisons, called Attribute Profiles, are proposed for the representation of structure and motion information. Sets of Gribskov/Eisenberg 1D Profiles and associated Attribute Profiles will be combined into single predictions using neural network data models. Prediction outcomes will be compared with experimental observations and evaluated using the jackknife method. If successful, this work will improve prediction of protein function from amino acid sequence, which is important given the significant numbers of protein sequences with undetermined functions coming out of the various genome projects. Motion and disorder are important pieces that need to be added to the characterization of protein structure/function. For example, taxol has been shown recently to bind to a disordered loop in Bcl-2. Alzheimer disease, transmissible spongiform encephalopathies, Parkinson disease and infectious agents such as Staphylococcal aureus, foot-and-mouth disease virus, and (perhaps) HIV depend critically on disordered regions of protein. Thus, understanding the interplay of sequence, flexibility, order / disorder, complexity, structure and function clearly relates to human health.
|Effective start/end date||5/1/00 → 4/30/03|
- National Institutes of Health: $332,118.00
- National Institutes of Health: $309,826.00
- National Institutes of Health: $342,080.00
- Health Professions(all)