Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches

Suranga Nath Kasthurirathne, Brian Dixon, Shaun Grannis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.

Original languageEnglish (US)
Title of host publicationStudies in Health Technology and Informatics
PublisherIOS Press
Pages1070
Number of pages1
Volume216
ISBN (Print)9781614995630
DOIs
StatePublished - 2015
Event15th World Congress on Health and Biomedical Informatics, MEDINFO 2015 - Sao Paulo, Brazil
Duration: Aug 19 2015Aug 23 2015

Publication series

NameStudies in Health Technology and Informatics
Volume216
ISSN (Print)09269630
ISSN (Electronic)18798365

Other

Other15th World Congress on Health and Biomedical Informatics, MEDINFO 2015
CountryBrazil
CitySao Paulo
Period8/19/158/23/15

Fingerprint

Pathology
Learning systems
Glossaries
Neoplasms
Machine Learning

Keywords

  • cancer
  • data preprocessing
  • decision models
  • ontologies
  • pathology
  • Public health reporting

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Kasthurirathne, S. N., Dixon, B., & Grannis, S. (2015). Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches. In Studies in Health Technology and Informatics (Vol. 216, pp. 1070). (Studies in Health Technology and Informatics; Vol. 216). IOS Press. https://doi.org/10.3233/978-1-61499-564-7-1070

Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches. / Kasthurirathne, Suranga Nath; Dixon, Brian; Grannis, Shaun.

Studies in Health Technology and Informatics. Vol. 216 IOS Press, 2015. p. 1070 (Studies in Health Technology and Informatics; Vol. 216).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kasthurirathne, SN, Dixon, B & Grannis, S 2015, Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches. in Studies in Health Technology and Informatics. vol. 216, Studies in Health Technology and Informatics, vol. 216, IOS Press, pp. 1070, 15th World Congress on Health and Biomedical Informatics, MEDINFO 2015, Sao Paulo, Brazil, 8/19/15. https://doi.org/10.3233/978-1-61499-564-7-1070
Kasthurirathne SN, Dixon B, Grannis S. Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches. In Studies in Health Technology and Informatics. Vol. 216. IOS Press. 2015. p. 1070. (Studies in Health Technology and Informatics). https://doi.org/10.3233/978-1-61499-564-7-1070
Kasthurirathne, Suranga Nath ; Dixon, Brian ; Grannis, Shaun. / Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches. Studies in Health Technology and Informatics. Vol. 216 IOS Press, 2015. pp. 1070 (Studies in Health Technology and Informatics).
@inproceedings{1166d93f92574771b252bd9ccfd492f2,
title = "Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches",
abstract = "Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.",
keywords = "cancer, data preprocessing, decision models, ontologies, pathology, Public health reporting",
author = "Kasthurirathne, {Suranga Nath} and Brian Dixon and Shaun Grannis",
year = "2015",
doi = "10.3233/978-1-61499-564-7-1070",
language = "English (US)",
isbn = "9781614995630",
volume = "216",
series = "Studies in Health Technology and Informatics",
publisher = "IOS Press",
pages = "1070",
booktitle = "Studies in Health Technology and Informatics",

}

TY - GEN

T1 - Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches

AU - Kasthurirathne, Suranga Nath

AU - Dixon, Brian

AU - Grannis, Shaun

PY - 2015

Y1 - 2015

N2 - Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.

AB - Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.

KW - cancer

KW - data preprocessing

KW - decision models

KW - ontologies

KW - pathology

KW - Public health reporting

UR - http://www.scopus.com/inward/record.url?scp=84951931781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951931781&partnerID=8YFLogxK

U2 - 10.3233/978-1-61499-564-7-1070

DO - 10.3233/978-1-61499-564-7-1070

M3 - Conference contribution

SN - 9781614995630

VL - 216

T3 - Studies in Health Technology and Informatics

SP - 1070

BT - Studies in Health Technology and Informatics

PB - IOS Press

ER -