Regression trees for longitudinal data with baseline covariates

Madan Gopal Kundu, Jaroslaw Harezlak

Research output: Contribution to journalArticle

Abstract

Longitudinal changes in a population of interest are often heterogeneous and influenced by a combination of baseline factors. In such cases, classical linear mixed effects models [Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.] for the mean structure provide poor fit to the data. We propose regression tree methodology for the longitudinal data identifying and characterizing homogeneous subgroups. Currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework that overcomes these limitations utilizing a two-step approach. The LongCART first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type I error controlled, guarding against variable selection bias, over-fitting and spurious splitting. We obtained asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm was evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.

Original languageEnglish (US)
Pages (from-to)1-22
Number of pages22
JournalBiostatistics and Epidemiology
Volume3
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Selection Bias
Choline
Longitudinal Studies
HIV
Population

Keywords

  • brownian bridge
  • instability test
  • LongCART
  • longitudinal data
  • mixed models
  • regression tree
  • score process

ASJC Scopus subject areas

  • Epidemiology
  • Health Informatics

Cite this

Regression trees for longitudinal data with baseline covariates. / Kundu, Madan Gopal; Harezlak, Jaroslaw.

In: Biostatistics and Epidemiology, Vol. 3, No. 1, 01.01.2019, p. 1-22.

Research output: Contribution to journalArticle

@article{03d81de1053a4f2f995bb1c633a591af,
title = "Regression trees for longitudinal data with baseline covariates",
abstract = "Longitudinal changes in a population of interest are often heterogeneous and influenced by a combination of baseline factors. In such cases, classical linear mixed effects models [Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.] for the mean structure provide poor fit to the data. We propose regression tree methodology for the longitudinal data identifying and characterizing homogeneous subgroups. Currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework that overcomes these limitations utilizing a two-step approach. The LongCART first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type I error controlled, guarding against variable selection bias, over-fitting and spurious splitting. We obtained asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm was evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.",
keywords = "brownian bridge, instability test, LongCART, longitudinal data, mixed models, regression tree, score process",
author = "Kundu, {Madan Gopal} and Jaroslaw Harezlak",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/24709360.2018.1557797",
language = "English (US)",
volume = "3",
pages = "1--22",
journal = "Biostatistics and Epidemiology",
issn = "2470-9360",
publisher = "Taylor and Francis Ltd.",
number = "1",

}

TY - JOUR

T1 - Regression trees for longitudinal data with baseline covariates

AU - Kundu, Madan Gopal

AU - Harezlak, Jaroslaw

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Longitudinal changes in a population of interest are often heterogeneous and influenced by a combination of baseline factors. In such cases, classical linear mixed effects models [Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.] for the mean structure provide poor fit to the data. We propose regression tree methodology for the longitudinal data identifying and characterizing homogeneous subgroups. Currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework that overcomes these limitations utilizing a two-step approach. The LongCART first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type I error controlled, guarding against variable selection bias, over-fitting and spurious splitting. We obtained asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm was evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.

AB - Longitudinal changes in a population of interest are often heterogeneous and influenced by a combination of baseline factors. In such cases, classical linear mixed effects models [Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.] for the mean structure provide poor fit to the data. We propose regression tree methodology for the longitudinal data identifying and characterizing homogeneous subgroups. Currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework that overcomes these limitations utilizing a two-step approach. The LongCART first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type I error controlled, guarding against variable selection bias, over-fitting and spurious splitting. We obtained asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm was evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.

KW - brownian bridge

KW - instability test

KW - LongCART

KW - longitudinal data

KW - mixed models

KW - regression tree

KW - score process

UR - http://www.scopus.com/inward/record.url?scp=85059352391&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059352391&partnerID=8YFLogxK

U2 - 10.1080/24709360.2018.1557797

DO - 10.1080/24709360.2018.1557797

M3 - Article

AN - SCOPUS:85059352391

VL - 3

SP - 1

EP - 22

JO - Biostatistics and Epidemiology

JF - Biostatistics and Epidemiology

SN - 2470-9360

IS - 1

ER -