A context-sensitive approach to anonymizing spatial surveillance data

Impact on outbreak detection

Christopher A. Cassa, Shaun Grannis, J. Marc Overhage, Kenneth D. Mandl

Research output: Contribution to journalArticle

58 Citations (Scopus)

Abstract

Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%. Conclusion: A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.

Original languageEnglish
Pages (from-to)160-165
Number of pages6
JournalJournal of the American Medical Informatics Association
Volume13
Issue number2
DOIs
StatePublished - Mar 2006

Fingerprint

Population Density
Disease Outbreaks
Cluster Analysis
Hospital Emergency Service
Epidemiology
Public Health
Geographic Mapping
Privacy
Research Personnel
Datasets

ASJC Scopus subject areas

  • Medicine(all)

Cite this

A context-sensitive approach to anonymizing spatial surveillance data : Impact on outbreak detection. / Cassa, Christopher A.; Grannis, Shaun; Overhage, J. Marc; Mandl, Kenneth D.

In: Journal of the American Medical Informatics Association, Vol. 13, No. 2, 03.2006, p. 160-165.

Research output: Contribution to journalArticle

@article{84635a2f90224585a3cffa97860e9663,
title = "A context-sensitive approach to anonymizing spatial surveillance data: Impact on outbreak detection",
abstract = "Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4{\%} and lowers the detection specificity less than 1{\%}. Conclusion: A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.",
author = "Cassa, {Christopher A.} and Shaun Grannis and Overhage, {J. Marc} and Mandl, {Kenneth D.}",
year = "2006",
month = "3",
doi = "10.1197/jamia.M1920",
language = "English",
volume = "13",
pages = "160--165",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - A context-sensitive approach to anonymizing spatial surveillance data

T2 - Impact on outbreak detection

AU - Cassa, Christopher A.

AU - Grannis, Shaun

AU - Overhage, J. Marc

AU - Mandl, Kenneth D.

PY - 2006/3

Y1 - 2006/3

N2 - Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%. Conclusion: A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.

AB - Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%. Conclusion: A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.

UR - http://www.scopus.com/inward/record.url?scp=33644669418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644669418&partnerID=8YFLogxK

U2 - 10.1197/jamia.M1920

DO - 10.1197/jamia.M1920

M3 - Article

VL - 13

SP - 160

EP - 165

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 2

ER -