The cross-validated AUC for MCP-logistic regression with high-dimensional data

Dingfeng Jiang, Jian Huang, Ying Zhang

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

Original languageEnglish (US)
Pages (from-to)505-518
Number of pages14
JournalStatistical Methods in Medical Research
Volume22
Issue number5
DOIs
StatePublished - Oct 1 2013
Externally publishedYes

Fingerprint

High-dimensional Data
Logistic Regression
Minimax
Area Under Curve
Penalty
Logistic Models
Bayesian Information Criterion
Akaike Information Criterion
Parameter Selection
Logistic Regression Model
imidazole mustard
Tuning
High-dimensional
Simulation Study
Coordinate Descent
Binary Outcomes
Characteristic Curve
Descent Algorithm
Penalty Method
Parameter Tuning

Keywords

  • binary outcome
  • cross-validation
  • high-dimensional data
  • Lasso
  • minimax concave penalty
  • tuning parameter selection

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this

The cross-validated AUC for MCP-logistic regression with high-dimensional data. / Jiang, Dingfeng; Huang, Jian; Zhang, Ying.

In: Statistical Methods in Medical Research, Vol. 22, No. 5, 01.10.2013, p. 505-518.

Research output: Contribution to journalArticle

@article{5b42b9b77e4a4885b7a9ab1ffb27d98a,
title = "The cross-validated AUC for MCP-logistic regression with high-dimensional data",
abstract = "We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.",
keywords = "binary outcome, cross-validation, high-dimensional data, Lasso, minimax concave penalty, tuning parameter selection",
author = "Dingfeng Jiang and Jian Huang and Ying Zhang",
year = "2013",
month = "10",
day = "1",
doi = "10.1177/0962280211428385",
language = "English (US)",
volume = "22",
pages = "505--518",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "5",

}

TY - JOUR

T1 - The cross-validated AUC for MCP-logistic regression with high-dimensional data

AU - Jiang, Dingfeng

AU - Huang, Jian

AU - Zhang, Ying

PY - 2013/10/1

Y1 - 2013/10/1

N2 - We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

AB - We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.

KW - binary outcome

KW - cross-validation

KW - high-dimensional data

KW - Lasso

KW - minimax concave penalty

KW - tuning parameter selection

UR - http://www.scopus.com/inward/record.url?scp=84886533098&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886533098&partnerID=8YFLogxK

U2 - 10.1177/0962280211428385

DO - 10.1177/0962280211428385

M3 - Article

VL - 22

SP - 505

EP - 518

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 5

ER -