Analysis of a probabilistic record linkage technique without human review.

Shaun Grannis, J. Marc Overhage, Siu Hui, Clement J. McDonald

Research output: Contribution to journalArticle

62 Citations (Scopus)

Abstract

We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95%, actual specificities were 99.43% and 99.42% for registries A and B, respectively. At an estimated sensitivity of 99.95%, actual sensitivities were 99.19% and 98.99% for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.

Original languageEnglish
Pages (from-to)259-263
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2003

Fingerprint

Registries
Gold

Cite this

Analysis of a probabilistic record linkage technique without human review. / Grannis, Shaun; Overhage, J. Marc; Hui, Siu; McDonald, Clement J.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2003, p. 259-263.

Research output: Contribution to journalArticle

@article{e99e8296d06e4602bdecb76e92c13ef9,
title = "Analysis of a probabilistic record linkage technique without human review.",
abstract = "We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90{\%} while maintaining 100{\%} specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95{\%}, actual specificities were 99.43{\%} and 99.42{\%} for registries A and B, respectively. At an estimated sensitivity of 99.95{\%}, actual sensitivities were 99.19{\%} and 98.99{\%} for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.",
author = "Shaun Grannis and Overhage, {J. Marc} and Siu Hui and McDonald, {Clement J.}",
year = "2003",
language = "English",
pages = "259--263",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - Analysis of a probabilistic record linkage technique without human review.

AU - Grannis, Shaun

AU - Overhage, J. Marc

AU - Hui, Siu

AU - McDonald, Clement J.

PY - 2003

Y1 - 2003

N2 - We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95%, actual specificities were 99.43% and 99.42% for registries A and B, respectively. At an estimated sensitivity of 99.95%, actual sensitivities were 99.19% and 98.99% for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.

AB - We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95%, actual specificities were 99.43% and 99.42% for registries A and B, respectively. At an estimated sensitivity of 99.95%, actual sensitivities were 99.19% and 98.99% for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.

UR - http://www.scopus.com/inward/record.url?scp=16544379030&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=16544379030&partnerID=8YFLogxK

M3 - Article

C2 - 14728174

AN - SCOPUS:16544379030

SP - 259

EP - 263

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -