Automated linkage of patient records from disparate sources

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

We introduce an automated method of record linkage that has two key features, automated selection of match field interactions to include in the model for estimation and automated threshold determination for classifying record pairs to matches or non-matches. We applied our method to two real-world examples. The first example demonstrated results consistent with our earlier work: When data quality is adequate and the match field discriminating power is high, matching algorithms exhibit similar performance. The second example demonstrated that our method yields a lower false positive rate and higher positive predictive value than the Fellegi-Sunter model in the face of low data quality. When compared to the Fellegi-Sunter model, simulation studies suggest that our method exhibits better overall performance as indicated by higher area under the curve, and less biased estimates for both the match prevalence rate and the m- and u-probabilities over a range of data scenarios, especially when the match prevalence is extreme. Computationally, our method is as efficient as the Fellegi-Sunter model. We recommend this method in situations that an unsupervised linking algorithm is needed.

Original languageEnglish (US)
Pages (from-to)172-184
Number of pages13
JournalStatistical Methods in Medical Research
Volume27
Issue number1
DOIs
StatePublished - Jan 1 2018

    Fingerprint

Keywords

  • Diagnostic tests
  • Fellegi-Sunter model
  • Latent class model
  • Log-linear model
  • Patient matching
  • Record linkage

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this