Natural language processing for the development of a clinical registry: A validation study in intraductal papillary mucinous neoplasms

Mohammad A. Al-Haddad, Jeff Friedlin, Joe Kesterson, Joshua A. Waters, Juan R. Aguilar-Saavedra, C. Schmidt

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Background: Medical natural language processing (NLP) systems have been developed to identify, extract and encode information within clinical narrative text. However, the role of NLP in clinical research and patient care remains limited. Pancreatic cysts are common. Some pancreatic cysts, such as intraductal papillary mucinous neoplasms (IPMNs), have malignant potential and require extended periods of surveillance. We seek to develop a novel NLP system that could be applied in our clinical network to develop a functional registry of IPMN patients. Objectives: This study aims to validate the accuracy of our novel NLP system in the identification of surgical patients with pathologically confirmed IPMN in comparison with our pre-existing manually created surgical database (standard reference). Methods: The Regenstrief EXtraction Tool (REX) was used to extract pancreatic cyst patient data from medical text files from Indiana University Health. The system was assessed periodically by direct sampling and review of medical records. Results were compared with the standard reference. Results: Natural language processing detected 5694 unique patients with pancreas cysts, in 215 of whom surgical pathology had confirmed IPMN. The NLP software identified all but seven patients present in the surgical database and identified an additional 37 IPMN patients not previously included in the surgical database. Using the standard reference, the sensitivity of the NLP program was 97.5% (95% confidence interval [CI] 94.8-98.9%) and its positive predictive value was 95.5% (95% CI 92.3-97.5%). Conclusions: Natural language processing is a reliable and accurate method for identifying selected patient cohorts and may facilitate the identification and follow-up of patients with IPMN.

Original languageEnglish (US)
Pages (from-to)688-695
Number of pages8
JournalHPB
Volume12
Issue number10
DOIs
StatePublished - Dec 2010

Fingerprint

Natural Language Processing
Language Development
Validation Studies
Registries
Pancreatic Cyst
Neoplasms
Databases
Confidence Intervals
Surgical Pathology
Medical Records
Cysts
Pancreas
Patient Care
Software

Keywords

  • cystic neoplasm
  • data mining
  • Intraductal papillary mucinous neoplasm
  • natural language processing
  • pancreatic cancer
  • precancerous
  • prevention

ASJC Scopus subject areas

  • Gastroenterology
  • Hepatology

Cite this

Natural language processing for the development of a clinical registry : A validation study in intraductal papillary mucinous neoplasms. / Al-Haddad, Mohammad A.; Friedlin, Jeff; Kesterson, Joe; Waters, Joshua A.; Aguilar-Saavedra, Juan R.; Schmidt, C.

In: HPB, Vol. 12, No. 10, 12.2010, p. 688-695.

Research output: Contribution to journalArticle

Al-Haddad, Mohammad A. ; Friedlin, Jeff ; Kesterson, Joe ; Waters, Joshua A. ; Aguilar-Saavedra, Juan R. ; Schmidt, C. / Natural language processing for the development of a clinical registry : A validation study in intraductal papillary mucinous neoplasms. In: HPB. 2010 ; Vol. 12, No. 10. pp. 688-695.
@article{40ad55b639564c99b57d983b0f694d5b,
title = "Natural language processing for the development of a clinical registry: A validation study in intraductal papillary mucinous neoplasms",
abstract = "Background: Medical natural language processing (NLP) systems have been developed to identify, extract and encode information within clinical narrative text. However, the role of NLP in clinical research and patient care remains limited. Pancreatic cysts are common. Some pancreatic cysts, such as intraductal papillary mucinous neoplasms (IPMNs), have malignant potential and require extended periods of surveillance. We seek to develop a novel NLP system that could be applied in our clinical network to develop a functional registry of IPMN patients. Objectives: This study aims to validate the accuracy of our novel NLP system in the identification of surgical patients with pathologically confirmed IPMN in comparison with our pre-existing manually created surgical database (standard reference). Methods: The Regenstrief EXtraction Tool (REX) was used to extract pancreatic cyst patient data from medical text files from Indiana University Health. The system was assessed periodically by direct sampling and review of medical records. Results were compared with the standard reference. Results: Natural language processing detected 5694 unique patients with pancreas cysts, in 215 of whom surgical pathology had confirmed IPMN. The NLP software identified all but seven patients present in the surgical database and identified an additional 37 IPMN patients not previously included in the surgical database. Using the standard reference, the sensitivity of the NLP program was 97.5{\%} (95{\%} confidence interval [CI] 94.8-98.9{\%}) and its positive predictive value was 95.5{\%} (95{\%} CI 92.3-97.5{\%}). Conclusions: Natural language processing is a reliable and accurate method for identifying selected patient cohorts and may facilitate the identification and follow-up of patients with IPMN.",
keywords = "cystic neoplasm, data mining, Intraductal papillary mucinous neoplasm, natural language processing, pancreatic cancer, precancerous, prevention",
author = "Al-Haddad, {Mohammad A.} and Jeff Friedlin and Joe Kesterson and Waters, {Joshua A.} and Aguilar-Saavedra, {Juan R.} and C. Schmidt",
year = "2010",
month = "12",
doi = "10.1111/j.1477-2574.2010.00235.x",
language = "English (US)",
volume = "12",
pages = "688--695",
journal = "HPB",
issn = "1365-182X",
publisher = "John Wiley and Sons Inc.",
number = "10",

}

TY - JOUR

T1 - Natural language processing for the development of a clinical registry

T2 - A validation study in intraductal papillary mucinous neoplasms

AU - Al-Haddad, Mohammad A.

AU - Friedlin, Jeff

AU - Kesterson, Joe

AU - Waters, Joshua A.

AU - Aguilar-Saavedra, Juan R.

AU - Schmidt, C.

PY - 2010/12

Y1 - 2010/12

N2 - Background: Medical natural language processing (NLP) systems have been developed to identify, extract and encode information within clinical narrative text. However, the role of NLP in clinical research and patient care remains limited. Pancreatic cysts are common. Some pancreatic cysts, such as intraductal papillary mucinous neoplasms (IPMNs), have malignant potential and require extended periods of surveillance. We seek to develop a novel NLP system that could be applied in our clinical network to develop a functional registry of IPMN patients. Objectives: This study aims to validate the accuracy of our novel NLP system in the identification of surgical patients with pathologically confirmed IPMN in comparison with our pre-existing manually created surgical database (standard reference). Methods: The Regenstrief EXtraction Tool (REX) was used to extract pancreatic cyst patient data from medical text files from Indiana University Health. The system was assessed periodically by direct sampling and review of medical records. Results were compared with the standard reference. Results: Natural language processing detected 5694 unique patients with pancreas cysts, in 215 of whom surgical pathology had confirmed IPMN. The NLP software identified all but seven patients present in the surgical database and identified an additional 37 IPMN patients not previously included in the surgical database. Using the standard reference, the sensitivity of the NLP program was 97.5% (95% confidence interval [CI] 94.8-98.9%) and its positive predictive value was 95.5% (95% CI 92.3-97.5%). Conclusions: Natural language processing is a reliable and accurate method for identifying selected patient cohorts and may facilitate the identification and follow-up of patients with IPMN.

AB - Background: Medical natural language processing (NLP) systems have been developed to identify, extract and encode information within clinical narrative text. However, the role of NLP in clinical research and patient care remains limited. Pancreatic cysts are common. Some pancreatic cysts, such as intraductal papillary mucinous neoplasms (IPMNs), have malignant potential and require extended periods of surveillance. We seek to develop a novel NLP system that could be applied in our clinical network to develop a functional registry of IPMN patients. Objectives: This study aims to validate the accuracy of our novel NLP system in the identification of surgical patients with pathologically confirmed IPMN in comparison with our pre-existing manually created surgical database (standard reference). Methods: The Regenstrief EXtraction Tool (REX) was used to extract pancreatic cyst patient data from medical text files from Indiana University Health. The system was assessed periodically by direct sampling and review of medical records. Results were compared with the standard reference. Results: Natural language processing detected 5694 unique patients with pancreas cysts, in 215 of whom surgical pathology had confirmed IPMN. The NLP software identified all but seven patients present in the surgical database and identified an additional 37 IPMN patients not previously included in the surgical database. Using the standard reference, the sensitivity of the NLP program was 97.5% (95% confidence interval [CI] 94.8-98.9%) and its positive predictive value was 95.5% (95% CI 92.3-97.5%). Conclusions: Natural language processing is a reliable and accurate method for identifying selected patient cohorts and may facilitate the identification and follow-up of patients with IPMN.

KW - cystic neoplasm

KW - data mining

KW - Intraductal papillary mucinous neoplasm

KW - natural language processing

KW - pancreatic cancer

KW - precancerous

KW - prevention

UR - http://www.scopus.com/inward/record.url?scp=78649683705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649683705&partnerID=8YFLogxK

U2 - 10.1111/j.1477-2574.2010.00235.x

DO - 10.1111/j.1477-2574.2010.00235.x

M3 - Article

C2 - 21083794

AN - SCOPUS:78649683705

VL - 12

SP - 688

EP - 695

JO - HPB

JF - HPB

SN - 1365-182X

IS - 10

ER -