Regression trees for longitudinal data with baseline covariates

Madan Gopal Kundu, Jaroslaw Harezlak

Research output: Contribution to journalArticle


Longitudinal changes in a population of interest are often heterogeneous and influenced by a combination of baseline factors. In such cases, classical linear mixed effects models [Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974.] for the mean structure provide poor fit to the data. We propose regression tree methodology for the longitudinal data identifying and characterizing homogeneous subgroups. Currently available regression tree construction methods are either limited to a repeated measures scenario or combine the heterogeneity among subgroups with the random inter-subject variability. We propose a longitudinal classification and regression tree (LongCART) algorithm under conditional inference framework that overcomes these limitations utilizing a two-step approach. The LongCART first selects the partitioning variable via a parameter instability test and then finds the optimal split for the selected partitioning variable. Thus, at each node, the decision of further splitting is type I error controlled, guarding against variable selection bias, over-fitting and spurious splitting. We obtained asymptotic results for the proposed instability test and examined its finite sample behavior through simulation studies. Comparative performance of LongCART algorithm was evaluated empirically via simulation studies. Finally, we applied LongCART to study the longitudinal changes in choline levels among HIV-positive patients.

Original languageEnglish (US)
Pages (from-to)1-22
Number of pages22
JournalBiostatistics and Epidemiology
Issue number1
StatePublished - Jan 1 2019


  • LongCART
  • brownian bridge
  • instability test
  • longitudinal data
  • mixed models
  • regression tree
  • score process

ASJC Scopus subject areas

  • Epidemiology
  • Health Informatics

Fingerprint Dive into the research topics of 'Regression trees for longitudinal data with baseline covariates'. Together they form a unique fingerprint.

  • Cite this