A practical approach for incorporating dependence among fields in probabilistic record linkage

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Background: Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. Methods. We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. Results: Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. Conclusions: This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.

Original languageEnglish
Article number97
JournalBMC Medical Informatics and Decision Making
Volume13
Issue number1
DOIs
StatePublished - 2013

Fingerprint

imidazole mustard
Software
Delivery of Health Care
Registries
Health

ASJC Scopus subject areas

  • Health Informatics
  • Health Policy

Cite this

@article{9ceb3e789b194c1f9a1a657a4ef8a0cc,
title = "A practical approach for incorporating dependence among fields in probabilistic record linkage",
abstract = "Background: Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. Methods. We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. Results: Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. Conclusions: This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.",
author = "Joanne Daggy and Huiping Xu and Siu Hui and Gamache, {Roland E.} and Shaun Grannis",
year = "2013",
doi = "10.1186/1472-6947-13-97",
language = "English",
volume = "13",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A practical approach for incorporating dependence among fields in probabilistic record linkage

AU - Daggy, Joanne

AU - Xu, Huiping

AU - Hui, Siu

AU - Gamache, Roland E.

AU - Grannis, Shaun

PY - 2013

Y1 - 2013

N2 - Background: Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. Methods. We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. Results: Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. Conclusions: This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.

AB - Background: Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. Methods. We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. Results: Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. Conclusions: This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.

UR - http://www.scopus.com/inward/record.url?scp=84883179492&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883179492&partnerID=8YFLogxK

U2 - 10.1186/1472-6947-13-97

DO - 10.1186/1472-6947-13-97

M3 - Article

VL - 13

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

IS - 1

M1 - 97

ER -