AUCTSP

An improved biomarker gene pair class predictor

Dimitri Kagaris, Alireza Khamesipour, Constantin Yiannoutsos

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). Results: We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions: The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

Original languageEnglish (US)
Article number244
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
StatePublished - Jun 26 2018

Fingerprint

Biomarkers
Predictors
Scoring
Genes
Gene
Ranking
Gene expression
Pivot
Gene Expression
Classifiers
Classifier
Reversal
Estimator
Lymphoma, Large B-Cell, Diffuse
Class
Gene Expression Profile
Prostate Cancer
B Cells
Formulation
Differential Expression

Keywords

  • AUC
  • Breast cancer
  • Colon cancer
  • Diffuse large B-Cell lymphoma
  • Gene expression
  • Gene selection
  • Leukemia
  • Microarray data analysis
  • Ovarian cancer
  • Prostate cancer
  • Receiver operating characteristic (ROC) curve

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

AUCTSP : An improved biomarker gene pair class predictor. / Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin.

In: BMC Bioinformatics, Vol. 19, No. 1, 244, 26.06.2018.

Research output: Contribution to journalArticle

Kagaris, Dimitri ; Khamesipour, Alireza ; Yiannoutsos, Constantin. / AUCTSP : An improved biomarker gene pair class predictor. In: BMC Bioinformatics. 2018 ; Vol. 19, No. 1.
@article{d59b89f51ba24e82942ea090e062efbf,
title = "AUCTSP: An improved biomarker gene pair class predictor",
abstract = "Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ({"}pivot{"} genes). Results: We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions: The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.",
keywords = "AUC, Breast cancer, Colon cancer, Diffuse large B-Cell lymphoma, Gene expression, Gene selection, Leukemia, Microarray data analysis, Ovarian cancer, Prostate cancer, Receiver operating characteristic (ROC) curve",
author = "Dimitri Kagaris and Alireza Khamesipour and Constantin Yiannoutsos",
year = "2018",
month = "6",
day = "26",
doi = "10.1186/s12859-018-2231-1",
language = "English (US)",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - AUCTSP

T2 - An improved biomarker gene pair class predictor

AU - Kagaris, Dimitri

AU - Khamesipour, Alireza

AU - Yiannoutsos, Constantin

PY - 2018/6/26

Y1 - 2018/6/26

N2 - Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). Results: We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions: The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

AB - Background: The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). Results: We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions: The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

KW - AUC

KW - Breast cancer

KW - Colon cancer

KW - Diffuse large B-Cell lymphoma

KW - Gene expression

KW - Gene selection

KW - Leukemia

KW - Microarray data analysis

KW - Ovarian cancer

KW - Prostate cancer

KW - Receiver operating characteristic (ROC) curve

UR - http://www.scopus.com/inward/record.url?scp=85049108760&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049108760&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2231-1

DO - 10.1186/s12859-018-2231-1

M3 - Article

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 244

ER -