An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling

Vivienne J. Zhu, Marc J. Overhage, James Egg, Stephen Downs, Shaun Grannis

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Objective: To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm. Background: Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data. Methods: The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values. Results: The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity. Conclusion: By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity.

Original languageEnglish
Pages (from-to)738-745
Number of pages8
JournalJournal of the American Medical Informatics Association
Volume16
Issue number5
DOIs
StatePublished - Sep 2009

Fingerprint

Weights and Measures
Delivery of Health Care
Patient Care
Theoretical Models
Newborn Infant
Sensitivity and Specificity
Health Information Exchange

ASJC Scopus subject areas

  • Health Informatics

Cite this

An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling. / Zhu, Vivienne J.; Overhage, Marc J.; Egg, James; Downs, Stephen; Grannis, Shaun.

In: Journal of the American Medical Informatics Association, Vol. 16, No. 5, 09.2009, p. 738-745.

Research output: Contribution to journalArticle

@article{98b09ae9205448e98c254a671a7d17e7,
title = "An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling",
abstract = "Objective: To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm. Background: Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data. Methods: The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values. Results: The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9{\%}, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10{\%} increase in specificity with a 3{\%} decrease in sensitivity. Conclusion: By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity.",
author = "Zhu, {Vivienne J.} and Overhage, {Marc J.} and James Egg and Stephen Downs and Shaun Grannis",
year = "2009",
month = "9",
doi = "10.1197/jamia.M3186",
language = "English",
volume = "16",
pages = "738--745",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling

AU - Zhu, Vivienne J.

AU - Overhage, Marc J.

AU - Egg, James

AU - Downs, Stephen

AU - Grannis, Shaun

PY - 2009/9

Y1 - 2009/9

N2 - Objective: To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm. Background: Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data. Methods: The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values. Results: The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity. Conclusion: By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity.

AB - Objective: To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm. Background: Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data. Methods: The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values. Results: The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity. Conclusion: By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity.

UR - http://www.scopus.com/inward/record.url?scp=69549097741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549097741&partnerID=8YFLogxK

U2 - 10.1197/jamia.M3186

DO - 10.1197/jamia.M3186

M3 - Article

C2 - 19567789

AN - SCOPUS:69549097741

VL - 16

SP - 738

EP - 745

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -