Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study

Zhigeng Geng, Sijian Wang, Menggang Yu, Patrick Monahan, Victoria Champion, Grace Wahba

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.

Original languageEnglish (US)
Pages (from-to)53-62
Number of pages10
JournalBiometrics
Volume71
Issue number1
DOIs
StatePublished - Mar 1 2015

Fingerprint

Variable Selection
Breast Cancer
breast neoplasms
Penalty
Breast Neoplasms
methodology
Covariates
selection methods
engineering
Sample Size
neoplasms
prediction
Coordinate Descent
Non-convexity
Descent Algorithm
Strictly Convex
Engineering Application
Error Bounds
Cancer
High-dimensional

Keywords

  • Breast cancer survivor
  • Finite sample bound
  • Group variable selection
  • High-dimensional data
  • Penalized estimation
  • Sparsity recovery

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study. / Geng, Zhigeng; Wang, Sijian; Yu, Menggang; Monahan, Patrick; Champion, Victoria; Wahba, Grace.

In: Biometrics, Vol. 71, No. 1, 01.03.2015, p. 53-62.

Research output: Contribution to journalArticle

@article{e93bc035ba5e4b7499da3ae1fe3dd1c4,
title = "Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study",
abstract = "In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.",
keywords = "Breast cancer survivor, Finite sample bound, Group variable selection, High-dimensional data, Penalized estimation, Sparsity recovery",
author = "Zhigeng Geng and Sijian Wang and Menggang Yu and Patrick Monahan and Victoria Champion and Grace Wahba",
year = "2015",
month = "3",
day = "1",
doi = "10.1111/biom.12230",
language = "English (US)",
volume = "71",
pages = "53--62",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study

AU - Geng, Zhigeng

AU - Wang, Sijian

AU - Yu, Menggang

AU - Monahan, Patrick

AU - Champion, Victoria

AU - Wahba, Grace

PY - 2015/3/1

Y1 - 2015/3/1

N2 - In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.

AB - In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.

KW - Breast cancer survivor

KW - Finite sample bound

KW - Group variable selection

KW - High-dimensional data

KW - Penalized estimation

KW - Sparsity recovery

UR - http://www.scopus.com/inward/record.url?scp=84961384787&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961384787&partnerID=8YFLogxK

U2 - 10.1111/biom.12230

DO - 10.1111/biom.12230

M3 - Article

C2 - 25257196

AN - SCOPUS:84961384787

VL - 71

SP - 53

EP - 62

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -