Preparing a collection of radiology examinations for distribution and retrieval

Dina Demner-Fushman, Marc D. Kohli, Marc Rosenman, Sonya E. Shooshan, Laritza Rodriguez, Sameer Antani, George R. Thoma, Clement J. McDonald

Research output: Contribution to journalArticle

54 Citations (Scopus)

Abstract

Objective Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. Materials and Methods The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. Results The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Conclusion Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/).

Original languageEnglish (US)
Pages (from-to)304-310
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume23
Issue number2
DOIs
StatePublished - Mar 1 2016

Fingerprint

Radiology
National Library of Medicine (U.S.)
Patient Care
Thorax
X-Rays
Databases
Education
Health
Research
Radiologists

Keywords

  • Abstracting and indexing
  • Biometric identification
  • Information storage and retrieval
  • Medical records
  • Radiography

ASJC Scopus subject areas

  • Health Informatics

Cite this

Demner-Fushman, D., Kohli, M. D., Rosenman, M., Shooshan, S. E., Rodriguez, L., Antani, S., ... McDonald, C. J. (2016). Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2), 304-310. https://doi.org/10.1093/jamia/ocv080

Preparing a collection of radiology examinations for distribution and retrieval. / Demner-Fushman, Dina; Kohli, Marc D.; Rosenman, Marc; Shooshan, Sonya E.; Rodriguez, Laritza; Antani, Sameer; Thoma, George R.; McDonald, Clement J.

In: Journal of the American Medical Informatics Association, Vol. 23, No. 2, 01.03.2016, p. 304-310.

Research output: Contribution to journalArticle

Demner-Fushman, D, Kohli, MD, Rosenman, M, Shooshan, SE, Rodriguez, L, Antani, S, Thoma, GR & McDonald, CJ 2016, 'Preparing a collection of radiology examinations for distribution and retrieval', Journal of the American Medical Informatics Association, vol. 23, no. 2, pp. 304-310. https://doi.org/10.1093/jamia/ocv080
Demner-Fushman, Dina ; Kohli, Marc D. ; Rosenman, Marc ; Shooshan, Sonya E. ; Rodriguez, Laritza ; Antani, Sameer ; Thoma, George R. ; McDonald, Clement J. / Preparing a collection of radiology examinations for distribution and retrieval. In: Journal of the American Medical Informatics Association. 2016 ; Vol. 23, No. 2. pp. 304-310.
@article{a026bb0c6ba8446baee7f14f1d963ebf,
title = "Preparing a collection of radiology examinations for distribution and retrieval",
abstract = "Objective Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. Materials and Methods The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. Results The automatic de-identification of the narrative was aggressive and achieved 100{\%} precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05{\%}) showed protected health information. Manual encoding of findings improved retrieval precision. Conclusion Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/).",
keywords = "Abstracting and indexing, Biometric identification, Information storage and retrieval, Medical records, Radiography",
author = "Dina Demner-Fushman and Kohli, {Marc D.} and Marc Rosenman and Shooshan, {Sonya E.} and Laritza Rodriguez and Sameer Antani and Thoma, {George R.} and McDonald, {Clement J.}",
year = "2016",
month = "3",
day = "1",
doi = "10.1093/jamia/ocv080",
language = "English (US)",
volume = "23",
pages = "304--310",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Preparing a collection of radiology examinations for distribution and retrieval

AU - Demner-Fushman, Dina

AU - Kohli, Marc D.

AU - Rosenman, Marc

AU - Shooshan, Sonya E.

AU - Rodriguez, Laritza

AU - Antani, Sameer

AU - Thoma, George R.

AU - McDonald, Clement J.

PY - 2016/3/1

Y1 - 2016/3/1

N2 - Objective Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. Materials and Methods The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. Results The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Conclusion Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/).

AB - Objective Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. Materials and Methods The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. Results The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Conclusion Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/).

KW - Abstracting and indexing

KW - Biometric identification

KW - Information storage and retrieval

KW - Medical records

KW - Radiography

UR - http://www.scopus.com/inward/record.url?scp=84963729804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963729804&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocv080

DO - 10.1093/jamia/ocv080

M3 - Article

C2 - 26133894

AN - SCOPUS:84963729804

VL - 23

SP - 304

EP - 310

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 2

ER -