Analysis of identifier performance using a deterministic linkage algorithm.

Shaun Grannis, J. Marc Overhage, Clement J. McDonald

Research output: Contribution to journalArticle

61 Citations (Scopus)

Abstract

As part of developing a record linkage algorithm using de-identified patient data, we analyzed the performance of several demographic variables for making linkages between patient registry records from two hospital registries and the Social Security Death Master File. We analyzed samples from each registry totaling 6,000 record-pairs to establish a linkage gold-standard. Using Social Security Number as the exclusive linkage variable resulted in substantial linkage error rates of 4.7% and 9.2%. The best single variable combination for finding links was Social Security Number, phonetically compressed first name, birth month, and gender. This found 87% and 88% of the links without any false links. We achieved sensitivities of 90% to 92% while maintaining 100% specificity using combinations of social security number, gender, name, and birth date fields. This represents an accurate method for linking patient records to death data and is the basis for a more generalized de-identified linkage algorithm.

Original languageEnglish
Pages (from-to)305-309
Number of pages5
JournalProceedings / AMIA ... Annual Symposium. AMIA Symposium
StatePublished - 2002

Fingerprint

Social Security
Registries
Names
Birth Order
Death Certificates
Hospital Records
Gold
Demography
Parturition

Cite this

Analysis of identifier performance using a deterministic linkage algorithm. / Grannis, Shaun; Overhage, J. Marc; McDonald, Clement J.

In: Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 2002, p. 305-309.

Research output: Contribution to journalArticle

@article{ad36a94bb9094a1cb71e9be61fcf1066,
title = "Analysis of identifier performance using a deterministic linkage algorithm.",
abstract = "As part of developing a record linkage algorithm using de-identified patient data, we analyzed the performance of several demographic variables for making linkages between patient registry records from two hospital registries and the Social Security Death Master File. We analyzed samples from each registry totaling 6,000 record-pairs to establish a linkage gold-standard. Using Social Security Number as the exclusive linkage variable resulted in substantial linkage error rates of 4.7{\%} and 9.2{\%}. The best single variable combination for finding links was Social Security Number, phonetically compressed first name, birth month, and gender. This found 87{\%} and 88{\%} of the links without any false links. We achieved sensitivities of 90{\%} to 92{\%} while maintaining 100{\%} specificity using combinations of social security number, gender, name, and birth date fields. This represents an accurate method for linking patient records to death data and is the basis for a more generalized de-identified linkage algorithm.",
author = "Shaun Grannis and Overhage, {J. Marc} and McDonald, {Clement J.}",
year = "2002",
language = "English",
pages = "305--309",
journal = "Proceedings / AMIA . Annual Symposium. AMIA Symposium",
issn = "1531-605X",
publisher = "Hanley & Belfus",

}

TY - JOUR

T1 - Analysis of identifier performance using a deterministic linkage algorithm.

AU - Grannis, Shaun

AU - Overhage, J. Marc

AU - McDonald, Clement J.

PY - 2002

Y1 - 2002

N2 - As part of developing a record linkage algorithm using de-identified patient data, we analyzed the performance of several demographic variables for making linkages between patient registry records from two hospital registries and the Social Security Death Master File. We analyzed samples from each registry totaling 6,000 record-pairs to establish a linkage gold-standard. Using Social Security Number as the exclusive linkage variable resulted in substantial linkage error rates of 4.7% and 9.2%. The best single variable combination for finding links was Social Security Number, phonetically compressed first name, birth month, and gender. This found 87% and 88% of the links without any false links. We achieved sensitivities of 90% to 92% while maintaining 100% specificity using combinations of social security number, gender, name, and birth date fields. This represents an accurate method for linking patient records to death data and is the basis for a more generalized de-identified linkage algorithm.

AB - As part of developing a record linkage algorithm using de-identified patient data, we analyzed the performance of several demographic variables for making linkages between patient registry records from two hospital registries and the Social Security Death Master File. We analyzed samples from each registry totaling 6,000 record-pairs to establish a linkage gold-standard. Using Social Security Number as the exclusive linkage variable resulted in substantial linkage error rates of 4.7% and 9.2%. The best single variable combination for finding links was Social Security Number, phonetically compressed first name, birth month, and gender. This found 87% and 88% of the links without any false links. We achieved sensitivities of 90% to 92% while maintaining 100% specificity using combinations of social security number, gender, name, and birth date fields. This represents an accurate method for linking patient records to death data and is the basis for a more generalized de-identified linkage algorithm.

UR - http://www.scopus.com/inward/record.url?scp=0036371278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036371278&partnerID=8YFLogxK

M3 - Article

C2 - 12463836

AN - SCOPUS:0036371278

SP - 305

EP - 309

JO - Proceedings / AMIA . Annual Symposium. AMIA Symposium

JF - Proceedings / AMIA . Annual Symposium. AMIA Symposium

SN - 1531-605X

ER -