A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data

Seongho Kim, Ming Ouyang, Jaesik Jeong, Changyu Shen, Xiang Zhang

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach.We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.

Original languageEnglish
Pages (from-to)1209-1231
Number of pages23
JournalAnnals of Applied Statistics
Volume8
Issue number2
DOIs
StatePublished - 2014

Fingerprint

Gas Chromatography
Mass Spectrometry
Gas chromatography
Mass spectrometry
Mixture Model
Bernoulli
Bayes Factor
Trial and error
Time-of-flight
Gaussian Mixture Model
Probability Model
Gaussian distribution
Denoising
Gas
Merging
Probability distributions
Baseline
Data analysis
Siméon Denis Poisson
Probability Distribution

Keywords

  • Bayes factor
  • GCxGC-TOF MS
  • Metabolomics
  • Mixture model
  • Normal- exponential-bernoulli (NEB) model
  • Peak detection

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Modeling and Simulation
  • Statistics and Probability

Cite this

A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data. / Kim, Seongho; Ouyang, Ming; Jeong, Jaesik; Shen, Changyu; Zhang, Xiang.

In: Annals of Applied Statistics, Vol. 8, No. 2, 2014, p. 1209-1231.

Research output: Contribution to journalArticle

Kim, Seongho ; Ouyang, Ming ; Jeong, Jaesik ; Shen, Changyu ; Zhang, Xiang. / A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data. In: Annals of Applied Statistics. 2014 ; Vol. 8, No. 2. pp. 1209-1231.
@article{9f47f0061656427faedd55c7608f6ebe,
title = "A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data",
abstract = "We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach.We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.",
keywords = "Bayes factor, GCxGC-TOF MS, Metabolomics, Mixture model, Normal- exponential-bernoulli (NEB) model, Peak detection",
author = "Seongho Kim and Ming Ouyang and Jaesik Jeong and Changyu Shen and Xiang Zhang",
year = "2014",
doi = "10.1214/14-AOAS731",
language = "English",
volume = "8",
pages = "1209--1231",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "2",

}

TY - JOUR

T1 - A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data

AU - Kim, Seongho

AU - Ouyang, Ming

AU - Jeong, Jaesik

AU - Shen, Changyu

AU - Zhang, Xiang

PY - 2014

Y1 - 2014

N2 - We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach.We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.

AB - We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach.We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.

KW - Bayes factor

KW - GCxGC-TOF MS

KW - Metabolomics

KW - Mixture model

KW - Normal- exponential-bernoulli (NEB) model

KW - Peak detection

UR - http://www.scopus.com/inward/record.url?scp=84903766627&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903766627&partnerID=8YFLogxK

U2 - 10.1214/14-AOAS731

DO - 10.1214/14-AOAS731

M3 - Article

VL - 8

SP - 1209

EP - 1231

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 2

ER -