A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets

Changyu Shen, Lang Li, Jake Yue Chen

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proieomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.

Original languageEnglish
Pages (from-to)436-443
Number of pages8
JournalProteins: Structure, Function and Genetics
Volume64
Issue number2
DOIs
StatePublished - Aug 1 2006

Fingerprint

Multiprotein Complexes
Proteomics
Proteome
Mass Spectrometry
Proteins
Yeasts
Computational methods
Yeast
Peptides
Mass spectrometry
Datasets
Experiments
Data Accuracy

Keywords

  • Binomial Bernoulli (BB)
  • Complete binomial Bernoulli (CBB)
  • High-throughput
  • Multiprotein complex (MPC)
  • Protein-protein interaction
  • Y2H
  • Yeast two-hybrid

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this

A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets. / Shen, Changyu; Li, Lang; Chen, Jake Yue.

In: Proteins: Structure, Function and Genetics, Vol. 64, No. 2, 01.08.2006, p. 436-443.

Research output: Contribution to journalArticle

@article{b15d114bf03b46dc98e3423e7d1e663c,
title = "A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets",
abstract = "Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proieomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80{\%} sensitivity in detecting true associations, as compared with the 3{\%} sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3{\%}. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.",
keywords = "Binomial Bernoulli (BB), Complete binomial Bernoulli (CBB), High-throughput, Multiprotein complex (MPC), Protein-protein interaction, Y2H, Yeast two-hybrid",
author = "Changyu Shen and Lang Li and Chen, {Jake Yue}",
year = "2006",
month = "8",
day = "1",
doi = "10.1002/prot.20994",
language = "English",
volume = "64",
pages = "436--443",
journal = "Proteins: Structure, Function and Genetics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "2",

}

TY - JOUR

T1 - A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets

AU - Shen, Changyu

AU - Li, Lang

AU - Chen, Jake Yue

PY - 2006/8/1

Y1 - 2006/8/1

N2 - Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proieomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.

AB - Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proieomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.

KW - Binomial Bernoulli (BB)

KW - Complete binomial Bernoulli (CBB)

KW - High-throughput

KW - Multiprotein complex (MPC)

KW - Protein-protein interaction

KW - Y2H

KW - Yeast two-hybrid

UR - http://www.scopus.com/inward/record.url?scp=33745620168&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745620168&partnerID=8YFLogxK

U2 - 10.1002/prot.20994

DO - 10.1002/prot.20994

M3 - Article

C2 - 16705649

AN - SCOPUS:33745620168

VL - 64

SP - 436

EP - 443

JO - Proteins: Structure, Function and Genetics

JF - Proteins: Structure, Function and Genetics

SN - 0887-3585

IS - 2

ER -