Semiparametric zero-inflated modeling in multi-ethnic study of atherosclerosis (MESA)

Hai Liu, Shuangge Ma, Richard Kronmal, Kung Sik Chan

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using the semiparametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in humans. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.

Original languageEnglish
Pages (from-to)1236-1255
Number of pages20
JournalAnnals of Applied Statistics
Volume6
Issue number3
DOIs
StatePublished - Sep 2012

Fingerprint

Atherosclerosis
Coronary Artery
Calcium
Zero
Modeling
Regression Splines
Cubic Spline
Selection Procedures
Monte Carlo Study
Variable Selection
Cross-validation
Model Selection
Model
Splines
Likelihood
Data analysis

Keywords

  • Cardiovascular disease
  • Coronary artery calcium
  • Likelihood cross-validation
  • Model selection
  • Penalized spline
  • Proportional constraint
  • Shrinkage

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Modeling and Simulation
  • Statistics and Probability

Cite this

Semiparametric zero-inflated modeling in multi-ethnic study of atherosclerosis (MESA). / Liu, Hai; Ma, Shuangge; Kronmal, Richard; Chan, Kung Sik.

In: Annals of Applied Statistics, Vol. 6, No. 3, 09.2012, p. 1236-1255.

Research output: Contribution to journalArticle

Liu, Hai ; Ma, Shuangge ; Kronmal, Richard ; Chan, Kung Sik. / Semiparametric zero-inflated modeling in multi-ethnic study of atherosclerosis (MESA). In: Annals of Applied Statistics. 2012 ; Vol. 6, No. 3. pp. 1236-1255.
@article{79c412931391402ca0eeca18408f9cce,
title = "Semiparametric zero-inflated modeling in multi-ethnic study of atherosclerosis (MESA)",
abstract = "We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using the semiparametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in humans. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.",
keywords = "Cardiovascular disease, Coronary artery calcium, Likelihood cross-validation, Model selection, Penalized spline, Proportional constraint, Shrinkage",
author = "Hai Liu and Shuangge Ma and Richard Kronmal and Chan, {Kung Sik}",
year = "2012",
month = "9",
doi = "10.1214/11-AOAS534",
language = "English",
volume = "6",
pages = "1236--1255",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "3",

}

TY - JOUR

T1 - Semiparametric zero-inflated modeling in multi-ethnic study of atherosclerosis (MESA)

AU - Liu, Hai

AU - Ma, Shuangge

AU - Kronmal, Richard

AU - Chan, Kung Sik

PY - 2012/9

Y1 - 2012/9

N2 - We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using the semiparametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in humans. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.

AB - We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using the semiparametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in humans. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.

KW - Cardiovascular disease

KW - Coronary artery calcium

KW - Likelihood cross-validation

KW - Model selection

KW - Penalized spline

KW - Proportional constraint

KW - Shrinkage

UR - http://www.scopus.com/inward/record.url?scp=84870039597&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870039597&partnerID=8YFLogxK

U2 - 10.1214/11-AOAS534

DO - 10.1214/11-AOAS534

M3 - Article

AN - SCOPUS:84870039597

VL - 6

SP - 1236

EP - 1255

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 3

ER -