Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation

Corinne Rancurel, Mahvash Khosravi, A. Dunker, Pedro R. Romero, David Karlin

Research output: Contribution to journalArticle

106 Citations (Scopus)

Abstract

It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.

Original languageEnglish
Pages (from-to)10719-10736
Number of pages18
JournalJournal of Virology
Volume83
Issue number20
DOIs
StatePublished - Oct 2009

Fingerprint

Overlapping Genes
Proteins
genes
proteins
Reading Frames
viral morphology
protein products
protein structure
virus replication

ASJC Scopus subject areas

  • Immunology
  • Virology

Cite this

Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. / Rancurel, Corinne; Khosravi, Mahvash; Dunker, A.; Romero, Pedro R.; Karlin, David.

In: Journal of Virology, Vol. 83, No. 20, 10.2009, p. 10719-10736.

Research output: Contribution to journalArticle

Rancurel, Corinne ; Khosravi, Mahvash ; Dunker, A. ; Romero, Pedro R. ; Karlin, David. / Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. In: Journal of Virology. 2009 ; Vol. 83, No. 20. pp. 10719-10736.
@article{747b7ddb8f964221a0d0216b87335202,
title = "Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation",
abstract = "It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called {"}overprinting.{"} To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.",
author = "Corinne Rancurel and Mahvash Khosravi and A. Dunker and Romero, {Pedro R.} and David Karlin",
year = "2009",
month = "10",
doi = "10.1128/JVI.00595-09",
language = "English",
volume = "83",
pages = "10719--10736",
journal = "Journal of Virology",
issn = "0022-538X",
publisher = "American Society for Microbiology",
number = "20",

}

TY - JOUR

T1 - Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation

AU - Rancurel, Corinne

AU - Khosravi, Mahvash

AU - Dunker, A.

AU - Romero, Pedro R.

AU - Karlin, David

PY - 2009/10

Y1 - 2009/10

N2 - It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.

AB - It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.

UR - http://www.scopus.com/inward/record.url?scp=70349737853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349737853&partnerID=8YFLogxK

U2 - 10.1128/JVI.00595-09

DO - 10.1128/JVI.00595-09

M3 - Article

VL - 83

SP - 10719

EP - 10736

JO - Journal of Virology

JF - Journal of Virology

SN - 0022-538X

IS - 20

ER -