Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs.

Timothy Clough, Safia Thaminy, Susanne Ragg, Ruedi Aebersold, Olga Vitek

Research output: Contribution to journalArticle

70 Citations (Scopus)

Abstract

Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.

Original languageEnglish (US)
JournalBMC Bioinformatics
Volume13 Suppl 16
StatePublished - 2012
Externally publishedYes

Fingerprint

Quantification
Labels
Proteins
Protein
Experiment
Experiments
Software
Statistical Modeling
Time and motion study
Design
Proteomics
Liquid chromatography
Mass Spectrometry
Statistical Models
Tandem Mass Spectrometry
Experimental design
Experimental Investigation
Chromatography
Software Package
Software packages

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. / Clough, Timothy; Thaminy, Safia; Ragg, Susanne; Aebersold, Ruedi; Vitek, Olga.

In: BMC Bioinformatics, Vol. 13 Suppl 16, 2012.

Research output: Contribution to journalArticle

Clough, Timothy ; Thaminy, Safia ; Ragg, Susanne ; Aebersold, Ruedi ; Vitek, Olga. / Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs. In: BMC Bioinformatics. 2012 ; Vol. 13 Suppl 16.
@article{06e93de31c474d4990fd493d165f1e39,
title = "Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs.",
abstract = "Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.",
author = "Timothy Clough and Safia Thaminy and Susanne Ragg and Ruedi Aebersold and Olga Vitek",
year = "2012",
language = "English (US)",
volume = "13 Suppl 16",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs.

AU - Clough, Timothy

AU - Thaminy, Safia

AU - Ragg, Susanne

AU - Aebersold, Ruedi

AU - Vitek, Olga

PY - 2012

Y1 - 2012

N2 - Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.

AB - Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.

UR - http://www.scopus.com/inward/record.url?scp=84878051310&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878051310&partnerID=8YFLogxK

M3 - Article

VL - 13 Suppl 16

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

ER -