A corpus-based approach for automated LOINC mapping

Mustafa Fidahussein, Daniel Vreeman

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Objective: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. Methods: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache's OpenNLP Maxent and the second based on information retrieval was created using Apache's Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. Results: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene's performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. Conclusions: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

Original languageEnglish
Pages (from-to)64-72
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume21
Issue number1
DOIs
StatePublished - 2014

Fingerprint

Logical Observation Identifiers Names and Codes
Vocabulary
Information Storage and Retrieval
Software

ASJC Scopus subject areas

  • Health Informatics

Cite this

A corpus-based approach for automated LOINC mapping. / Fidahussein, Mustafa; Vreeman, Daniel.

In: Journal of the American Medical Informatics Association, Vol. 21, No. 1, 2014, p. 64-72.

Research output: Contribution to journalArticle

@article{82ab50143c3a4cd0a78ff3fe9500fd56,
title = "A corpus-based approach for automated LOINC mapping",
abstract = "Objective: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. Methods: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache's OpenNLP Maxent and the second based on information retrieval was created using Apache's Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. Results: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5{\%} and 71.4{\%} and between 63.7{\%} and 65.0{\%} of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5{\%} and 84.6{\%} (mean 78.9{\%}) of local terms, whereas Lucene's performance was between 66.5{\%} and 76.6{\%} (mean 71.9{\%}). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57{\%} of local terms. Conclusions: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.",
author = "Mustafa Fidahussein and Daniel Vreeman",
year = "2014",
doi = "10.1136/amiajnl-2012-001159",
language = "English",
volume = "21",
pages = "64--72",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - A corpus-based approach for automated LOINC mapping

AU - Fidahussein, Mustafa

AU - Vreeman, Daniel

PY - 2014

Y1 - 2014

N2 - Objective: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. Methods: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache's OpenNLP Maxent and the second based on information retrieval was created using Apache's Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. Results: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene's performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. Conclusions: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

AB - Objective: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. Methods: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache's OpenNLP Maxent and the second based on information retrieval was created using Apache's Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. Results: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene's performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. Conclusions: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

UR - http://www.scopus.com/inward/record.url?scp=84890503518&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890503518&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2012-001159

DO - 10.1136/amiajnl-2012-001159

M3 - Article

VL - 21

SP - 64

EP - 72

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 1

ER -