Penalized solutions to functional regression problems

Jaroslaw Harezlak, Brent A. Coull, Nan M. Laird, Shannon R. Magari, David C. Christiani

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the basis function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis function coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of PM corresponding to smoking breaks.

Original languageEnglish (US)
Pages (from-to)4911-4925
Number of pages15
JournalComputational Statistics and Data Analysis
Volume51
Issue number10
DOIs
StatePublished - Jun 15 2007
Externally publishedYes

Fingerprint

Regression
Particulate Matter
Basis Functions
Coefficient
Heart Rate Variability
Penalty
Predictors
Functional Linear Model
Splines
Penalized Splines
Norm
Functional Data
Regularization Technique
Akaike Information Criterion
Functional Response
Sparse Representation
Penalization
Smoking
Regression Coefficient
Absolute value

Keywords

  • Environmental assessment
  • Functional data
  • Heart rate variability
  • LASSO
  • Penalized regression splines

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Computational Mathematics
  • Numerical Analysis
  • Statistics and Probability

Cite this

Penalized solutions to functional regression problems. / Harezlak, Jaroslaw; Coull, Brent A.; Laird, Nan M.; Magari, Shannon R.; Christiani, David C.

In: Computational Statistics and Data Analysis, Vol. 51, No. 10, 15.06.2007, p. 4911-4925.

Research output: Contribution to journalArticle

Harezlak, Jaroslaw ; Coull, Brent A. ; Laird, Nan M. ; Magari, Shannon R. ; Christiani, David C. / Penalized solutions to functional regression problems. In: Computational Statistics and Data Analysis. 2007 ; Vol. 51, No. 10. pp. 4911-4925.
@article{64d14f8045c64fd89e61950b91c899a4,
title = "Penalized solutions to functional regression problems",
abstract = "Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the basis function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis function coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of PM corresponding to smoking breaks.",
keywords = "Environmental assessment, Functional data, Heart rate variability, LASSO, Penalized regression splines",
author = "Jaroslaw Harezlak and Coull, {Brent A.} and Laird, {Nan M.} and Magari, {Shannon R.} and Christiani, {David C.}",
year = "2007",
month = "6",
day = "15",
doi = "10.1016/j.csda.2006.09.034",
language = "English (US)",
volume = "51",
pages = "4911--4925",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "10",

}

TY - JOUR

T1 - Penalized solutions to functional regression problems

AU - Harezlak, Jaroslaw

AU - Coull, Brent A.

AU - Laird, Nan M.

AU - Magari, Shannon R.

AU - Christiani, David C.

PY - 2007/6/15

Y1 - 2007/6/15

N2 - Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the basis function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis function coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of PM corresponding to smoking breaks.

AB - Recent technological advances in continuous biological monitoring and personal exposure assessment have led to the collection of subject-specific functional data. A primary goal in such studies is to assess the relationship between the functional predictors and the functional responses. The historical functional linear model (HFLM) can be used to model such dependencies of the response on the history of the predictor values. An estimation procedure for the regression coefficients that uses a variety of regularization techniques is proposed. An approximation of the regression surface relating the predictor to the outcome by a finite-dimensional basis expansion is used, followed by penalization of the coefficients of the neighboring basis functions by restricting the size of the coefficient differences to be small. Penalties based on the absolute values of the basis function coefficient differences (corresponding to the LASSO) and the squares of these differences (corresponding to the penalized spline methodology) are studied. The fits are compared using an extension of the Akaike Information Criterion that combines the error variance estimate, degrees of freedom of the fit and the norm of the basis function coefficients. The performance of the proposed methods is evaluated via simulations. The LASSO penalty applied to the linearly transformed coefficients yields sparser representations of the estimated regression surface, while the quadratic penalty provides solutions with the smallest L2-norm of the basis function coefficients. Finally, the new estimation procedure is applied to the analysis of the effects of occupational particulate matter (PM) exposure on heart rate variability (HRV) in a cohort of boilermaker workers. Results suggest that the strongest association between PM exposure and HRV in these workers occurs as a result of point exposures to the increased levels of PM corresponding to smoking breaks.

KW - Environmental assessment

KW - Functional data

KW - Heart rate variability

KW - LASSO

KW - Penalized regression splines

UR - http://www.scopus.com/inward/record.url?scp=34247375568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247375568&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2006.09.034

DO - 10.1016/j.csda.2006.09.034

M3 - Article

AN - SCOPUS:34247375568

VL - 51

SP - 4911

EP - 4925

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 10

ER -