Automated linkage of patient records from disparate sources

Research output: Contribution to journalArticle

1 Scopus citations


We introduce an automated method of record linkage that has two key features, automated selection of match field interactions to include in the model for estimation and automated threshold determination for classifying record pairs to matches or non-matches. We applied our method to two real-world examples. The first example demonstrated results consistent with our earlier work: When data quality is adequate and the match field discriminating power is high, matching algorithms exhibit similar performance. The second example demonstrated that our method yields a lower false positive rate and higher positive predictive value than the Fellegi-Sunter model in the face of low data quality. When compared to the Fellegi-Sunter model, simulation studies suggest that our method exhibits better overall performance as indicated by higher area under the curve, and less biased estimates for both the match prevalence rate and the m- and u-probabilities over a range of data scenarios, especially when the match prevalence is extreme. Computationally, our method is as efficient as the Fellegi-Sunter model. We recommend this method in situations that an unsupervised linking algorithm is needed.

Original languageEnglish (US)
Pages (from-to)172-184
Number of pages13
JournalStatistical Methods in Medical Research
Issue number1
StatePublished - Jan 1 2018


  • Diagnostic tests
  • Fellegi-Sunter model
  • Latent class model
  • Log-linear model
  • Patient matching
  • Record linkage

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Fingerprint Dive into the research topics of 'Automated linkage of patient records from disparate sources'. Together they form a unique fingerprint.

  • Cite this