Comparison of computational algorithms for the classification of liver cancer using SELDI mass spectrometry

A case study

Changyu Shen, Timothy E. Breen, Lacey E. Dobrolecki, C. Schmidt, George W. Sledge, Kathy Miller, Robert J. Hickey

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Introduction: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer. Methods: Nine supervised classification algorithms are implemented in R software and compared for the classification accuracy. Results: We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups. Conclusions: Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI) mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction analysis of microarrays provide better prediction accuracy for hepatocellular carcinoma using SELDI proteomic data than six other approaches.

Original languageEnglish
Pages (from-to)339-349
Number of pages11
JournalCancer Informatics
Volume3
StatePublished - 2007

Fingerprint

Matrix-Assisted Laser Desorption-Ionization Mass Spectrometry
Liver Neoplasms
Proteomics
Microarray Analysis
Hepatocellular Carcinoma
Lasers
Oligonucleotide Array Sequence Analysis
Ovarian Neoplasms
Mass Spectrometry
Software
Learning
Technology
Serum
Support Vector Machine
Forests
Neoplasms

Keywords

  • Classification
  • Hepatic carcinoma
  • Random forest
  • SELDI
  • Support vector machine

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

Comparison of computational algorithms for the classification of liver cancer using SELDI mass spectrometry : A case study. / Shen, Changyu; Breen, Timothy E.; Dobrolecki, Lacey E.; Schmidt, C.; Sledge, George W.; Miller, Kathy; Hickey, Robert J.

In: Cancer Informatics, Vol. 3, 2007, p. 339-349.

Research output: Contribution to journalArticle

Shen, Changyu ; Breen, Timothy E. ; Dobrolecki, Lacey E. ; Schmidt, C. ; Sledge, George W. ; Miller, Kathy ; Hickey, Robert J. / Comparison of computational algorithms for the classification of liver cancer using SELDI mass spectrometry : A case study. In: Cancer Informatics. 2007 ; Vol. 3. pp. 339-349.
@article{d480e49c189b4bba92a53225f2dd3a43,
title = "Comparison of computational algorithms for the classification of liver cancer using SELDI mass spectrometry: A case study",
abstract = "Introduction: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer. Methods: Nine supervised classification algorithms are implemented in R software and compared for the classification accuracy. Results: We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups. Conclusions: Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI) mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction analysis of microarrays provide better prediction accuracy for hepatocellular carcinoma using SELDI proteomic data than six other approaches.",
keywords = "Classification, Hepatic carcinoma, Random forest, SELDI, Support vector machine",
author = "Changyu Shen and Breen, {Timothy E.} and Dobrolecki, {Lacey E.} and C. Schmidt and Sledge, {George W.} and Kathy Miller and Hickey, {Robert J.}",
year = "2007",
language = "English",
volume = "3",
pages = "339--349",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - Comparison of computational algorithms for the classification of liver cancer using SELDI mass spectrometry

T2 - A case study

AU - Shen, Changyu

AU - Breen, Timothy E.

AU - Dobrolecki, Lacey E.

AU - Schmidt, C.

AU - Sledge, George W.

AU - Miller, Kathy

AU - Hickey, Robert J.

PY - 2007

Y1 - 2007

N2 - Introduction: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer. Methods: Nine supervised classification algorithms are implemented in R software and compared for the classification accuracy. Results: We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups. Conclusions: Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI) mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction analysis of microarrays provide better prediction accuracy for hepatocellular carcinoma using SELDI proteomic data than six other approaches.

AB - Introduction: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer. Methods: Nine supervised classification algorithms are implemented in R software and compared for the classification accuracy. Results: We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups. Conclusions: Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI) mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction analysis of microarrays provide better prediction accuracy for hepatocellular carcinoma using SELDI proteomic data than six other approaches.

KW - Classification

KW - Hepatic carcinoma

KW - Random forest

KW - SELDI

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=49649119385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49649119385&partnerID=8YFLogxK

M3 - Article

VL - 3

SP - 339

EP - 349

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -