Real world performance of approximate string comparators for use in patient matching

Shaun Grannis, J. Marc Overhage, Clement McDonald

Research output: Chapter in Book/Report/Conference proceedingChapter

17 Citations (Scopus)

Abstract

Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

Original languageEnglish
Title of host publicationStudies in Health Technology and Informatics
Pages43-47
Number of pages5
Volume107
DOIs
StatePublished - 2004

Fingerprint

Names
Medical Record Linkage
Vital Statistics
Phonetics
Speech analysis
Statistics

Keywords

  • Knowledge Management
  • Medical Record Linkage
  • Patient Matching

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Grannis, S., Overhage, J. M., & McDonald, C. (2004). Real world performance of approximate string comparators for use in patient matching. In Studies in Health Technology and Informatics (Vol. 107, pp. 43-47) https://doi.org/10.3233/978-1-60750-949-3-43

Real world performance of approximate string comparators for use in patient matching. / Grannis, Shaun; Overhage, J. Marc; McDonald, Clement.

Studies in Health Technology and Informatics. Vol. 107 2004. p. 43-47.

Research output: Chapter in Book/Report/Conference proceedingChapter

Grannis, S, Overhage, JM & McDonald, C 2004, Real world performance of approximate string comparators for use in patient matching. in Studies in Health Technology and Informatics. vol. 107, pp. 43-47. https://doi.org/10.3233/978-1-60750-949-3-43
Grannis S, Overhage JM, McDonald C. Real world performance of approximate string comparators for use in patient matching. In Studies in Health Technology and Informatics. Vol. 107. 2004. p. 43-47 https://doi.org/10.3233/978-1-60750-949-3-43
Grannis, Shaun ; Overhage, J. Marc ; McDonald, Clement. / Real world performance of approximate string comparators for use in patient matching. Studies in Health Technology and Informatics. Vol. 107 2004. pp. 43-47
@inbook{fce24f977e4d4a38b239f7eb6ec5b6ae,
title = "Real world performance of approximate string comparators for use in patient matching",
abstract = "Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4{\%} and 97.7{\%}. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10{\%} compared to exact match comparisons and represent an accurate method of linking to vital statistics data.",
keywords = "Knowledge Management, Medical Record Linkage, Patient Matching",
author = "Shaun Grannis and Overhage, {J. Marc} and Clement McDonald",
year = "2004",
doi = "10.3233/978-1-60750-949-3-43",
language = "English",
volume = "107",
pages = "43--47",
booktitle = "Studies in Health Technology and Informatics",

}

TY - CHAP

T1 - Real world performance of approximate string comparators for use in patient matching

AU - Grannis, Shaun

AU - Overhage, J. Marc

AU - McDonald, Clement

PY - 2004

Y1 - 2004

N2 - Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

AB - Medical record linkage is becoming increasingly important as clinical data is distributed across independent sources. To improve linkage accuracy we studied different name comparison methods that establish agreement or disagreement between corresponding names. In addition to exact raw name matching and exact phonetic name matching, we tested three approximate string comparators. The approximate comparators included the modified Jaro-Winkler method, the longest common substring, and the Levenshtein edit distance. We also calculated the combined root-mean square of all three. We tested each name comparison method using a deterministic record linkage algorithm. Results were consistent across both hospitals. At a threshold comparator score of 0.8, the Jaro-Winkler comparator achieved the highest linkage sensitivities of 97.4% and 97.7%. The combined root-mean square method achieved sensitivities higher than the Levenshtein edit distance or longest common substring while sustaining high linkage specificity. Approximate string comparators increase deterministic linkage sensitivity by up to 10% compared to exact match comparisons and represent an accurate method of linking to vital statistics data.

KW - Knowledge Management

KW - Medical Record Linkage

KW - Patient Matching

UR - http://www.scopus.com/inward/record.url?scp=69549083553&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549083553&partnerID=8YFLogxK

U2 - 10.3233/978-1-60750-949-3-43

DO - 10.3233/978-1-60750-949-3-43

M3 - Chapter

C2 - 15360771

AN - SCOPUS:69549083553

VL - 107

SP - 43

EP - 47

BT - Studies in Health Technology and Informatics

ER -