Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.

Original languageEnglish (US)
Title of host publicationStudies in Health Technology and Informatics
PublisherIOS Press
Number of pages1
ISBN (Print)9781614995630
StatePublished - 2015
Event15th World Congress on Health and Biomedical Informatics, MEDINFO 2015 - Sao Paulo, Brazil
Duration: Aug 19 2015Aug 23 2015

Publication series

NameStudies in Health Technology and Informatics
ISSN (Print)09269630
ISSN (Electronic)18798365


Other15th World Congress on Health and Biomedical Informatics, MEDINFO 2015
CitySao Paulo



  • cancer
  • data preprocessing
  • decision models
  • ontologies
  • pathology
  • Public health reporting

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this