On the estimation of false positives in peptide identifications using decoy search strategy

Changyu Shen, Quanhu Sheng, Jie Dai, Yixue Li, Rong Zeng, Haixu Tang

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation ofthe estimate across repeated scrambling.

Original languageEnglish
Pages (from-to)194-204
Number of pages11
JournalProteomics
Volume9
Issue number1
DOIs
StatePublished - Jan 2009

Fingerprint

Databases
Peptides
Bioinformatics
Computational Biology
Proteins
Amino Acids
Datasets

Keywords

  • Decoy databases
  • False positive
  • Mass spectrometry
  • Peptides
  • Sensitivity

ASJC Scopus subject areas

  • Molecular Biology
  • Biochemistry

Cite this

On the estimation of false positives in peptide identifications using decoy search strategy. / Shen, Changyu; Sheng, Quanhu; Dai, Jie; Li, Yixue; Zeng, Rong; Tang, Haixu.

In: Proteomics, Vol. 9, No. 1, 01.2009, p. 194-204.

Research output: Contribution to journalArticle

Shen, Changyu ; Sheng, Quanhu ; Dai, Jie ; Li, Yixue ; Zeng, Rong ; Tang, Haixu. / On the estimation of false positives in peptide identifications using decoy search strategy. In: Proteomics. 2009 ; Vol. 9, No. 1. pp. 194-204.
@article{f0d01f51da2f4045a268c59d8903d49d,
title = "On the estimation of false positives in peptide identifications using decoy search strategy",
abstract = "False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation ofthe estimate across repeated scrambling.",
keywords = "Decoy databases, False positive, Mass spectrometry, Peptides, Sensitivity",
author = "Changyu Shen and Quanhu Sheng and Jie Dai and Yixue Li and Rong Zeng and Haixu Tang",
year = "2009",
month = "1",
doi = "10.1002/pmic.200800330",
language = "English",
volume = "9",
pages = "194--204",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley-VCH Verlag",
number = "1",

}

TY - JOUR

T1 - On the estimation of false positives in peptide identifications using decoy search strategy

AU - Shen, Changyu

AU - Sheng, Quanhu

AU - Dai, Jie

AU - Li, Yixue

AU - Zeng, Rong

AU - Tang, Haixu

PY - 2009/1

Y1 - 2009/1

N2 - False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation ofthe estimate across repeated scrambling.

AB - False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation ofthe estimate across repeated scrambling.

KW - Decoy databases

KW - False positive

KW - Mass spectrometry

KW - Peptides

KW - Sensitivity

UR - http://www.scopus.com/inward/record.url?scp=59449097005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=59449097005&partnerID=8YFLogxK

U2 - 10.1002/pmic.200800330

DO - 10.1002/pmic.200800330

M3 - Article

C2 - 19053142

AN - SCOPUS:59449097005

VL - 9

SP - 194

EP - 204

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 1

ER -