Real world performance of approximate string comparators for use in patient matching

Shaun Grannis, J. Marc Overhage, Clement McDonald

Research output: Chapter in Book/Report/Conference proceedingChapter

21 Scopus citations


Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

Original languageEnglish
Title of host publicationStudies in Health Technology and Informatics
Number of pages5
StatePublished - 2004



  • Knowledge Management
  • Medical Record Linkage
  • Patient Matching

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Grannis, S., Overhage, J. M., & McDonald, C. (2004). Real world performance of approximate string comparators for use in patient matching. In Studies in Health Technology and Informatics (Vol. 107, pp. 43-47)