Evaluating latent class models with conditional dependence in record linkage

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model.We sought to further characterize the impact of conditional dependence on the overall misclassification rate, sensitivity, and positive predictive value in the record linkage problem when the conditional independence assumption is violated. Additionally, we evaluate various methods to account for the conditional dependence. These methods include loglinear models with appropriate interaction terms identified through the correlation residual plot as well as Gaussian random effects models. The proposed models are used to link newborn screening data obtained from a health information exchange. On the basis of simulations, loglinear models with interaction terms demonstrated the best misclassification rate, although this type of model cannot accommodate other data features such as continuous measures for agreement. Results indicate that Gaussian random effects models, which can handle additional data features, perform better than assuming conditional independence and in some situations perform as well as the loglinear model with interaction terms.

Original languageEnglish
Pages (from-to)4250-4265
Number of pages16
JournalStatistics in Medicine
Volume33
Issue number24
DOIs
StatePublished - 2014

Fingerprint

Record Linkage
Latent Class Model
Log-linear Models
Conditional Independence
Misclassification Rate
Random Effects Model
Gaussian Model
Term
Residual Plots
Interaction
Latent Class
Screening
Health
Classify
Model
Evaluate
Estimate
Simulation

Keywords

  • Latent class
  • Loglinear model
  • Random effects
  • Record linkage

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Medicine(all)

Cite this

@article{449beb211b1a427b9ebb2a49c1a52ccd,
title = "Evaluating latent class models with conditional dependence in record linkage",
abstract = "Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model.We sought to further characterize the impact of conditional dependence on the overall misclassification rate, sensitivity, and positive predictive value in the record linkage problem when the conditional independence assumption is violated. Additionally, we evaluate various methods to account for the conditional dependence. These methods include loglinear models with appropriate interaction terms identified through the correlation residual plot as well as Gaussian random effects models. The proposed models are used to link newborn screening data obtained from a health information exchange. On the basis of simulations, loglinear models with interaction terms demonstrated the best misclassification rate, although this type of model cannot accommodate other data features such as continuous measures for agreement. Results indicate that Gaussian random effects models, which can handle additional data features, perform better than assuming conditional independence and in some situations perform as well as the loglinear model with interaction terms.",
keywords = "Latent class, Loglinear model, Random effects, Record linkage",
author = "Joanne Daggy and Huiping Xu and Siu Hui and Shaun Grannis",
year = "2014",
doi = "10.1002/sim.6230",
language = "English",
volume = "33",
pages = "4250--4265",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "24",

}

TY - JOUR

T1 - Evaluating latent class models with conditional dependence in record linkage

AU - Daggy, Joanne

AU - Xu, Huiping

AU - Hui, Siu

AU - Grannis, Shaun

PY - 2014

Y1 - 2014

N2 - Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model.We sought to further characterize the impact of conditional dependence on the overall misclassification rate, sensitivity, and positive predictive value in the record linkage problem when the conditional independence assumption is violated. Additionally, we evaluate various methods to account for the conditional dependence. These methods include loglinear models with appropriate interaction terms identified through the correlation residual plot as well as Gaussian random effects models. The proposed models are used to link newborn screening data obtained from a health information exchange. On the basis of simulations, loglinear models with interaction terms demonstrated the best misclassification rate, although this type of model cannot accommodate other data features such as continuous measures for agreement. Results indicate that Gaussian random effects models, which can handle additional data features, perform better than assuming conditional independence and in some situations perform as well as the loglinear model with interaction terms.

AB - Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model.We sought to further characterize the impact of conditional dependence on the overall misclassification rate, sensitivity, and positive predictive value in the record linkage problem when the conditional independence assumption is violated. Additionally, we evaluate various methods to account for the conditional dependence. These methods include loglinear models with appropriate interaction terms identified through the correlation residual plot as well as Gaussian random effects models. The proposed models are used to link newborn screening data obtained from a health information exchange. On the basis of simulations, loglinear models with interaction terms demonstrated the best misclassification rate, although this type of model cannot accommodate other data features such as continuous measures for agreement. Results indicate that Gaussian random effects models, which can handle additional data features, perform better than assuming conditional independence and in some situations perform as well as the loglinear model with interaction terms.

KW - Latent class

KW - Loglinear model

KW - Random effects

KW - Record linkage

UR - http://www.scopus.com/inward/record.url?scp=84908071226&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908071226&partnerID=8YFLogxK

U2 - 10.1002/sim.6230

DO - 10.1002/sim.6230

M3 - Article

VL - 33

SP - 4250

EP - 4265

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 24

ER -