A statistical framework to discover true associations from multiprotein complex pull-down proteomics data sets

Changyu Shen, Lang Li, Jake Yue Chen

Research output: Contribution to journalArticle

3 Scopus citations


Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proieomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.

Original languageEnglish (US)
Pages (from-to)436-443
Number of pages8
JournalProteins: Structure, Function and Genetics
Issue number2
StatePublished - Aug 1 2006



  • Binomial Bernoulli (BB)
  • Complete binomial Bernoulli (CBB)
  • High-throughput
  • Multiprotein complex (MPC)
  • Protein-protein interaction
  • Y2H
  • Yeast two-hybrid

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this