Predicting dementia with routine care EMR data

Zina Ben Miled, Kyle Haas, Christopher M. Black, Rezaul Karim Khandker, Vasu Chandrasekaran, Richard Lipton, Malaz A. Boustani

Research output: Contribution to journalArticle

Abstract

Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80% despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.

Original languageEnglish (US)
Article number101771
JournalArtificial Intelligence in Medicine
Volume102
DOIs
StatePublished - Jan 2020

Fingerprint

Electronic medical equipment
Electronic Health Records
Dementia
Costs and Cost Analysis
Information Storage and Retrieval
Health care
Health Personnel
Prescriptions
Learning systems
Screening
Delivery of Health Care
Medical applications
Population
Datasets
Costs
Machine Learning
Monitoring

Keywords

  • Dementia
  • EMR
  • Machine learning
  • Prediction
  • Random forest

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Artificial Intelligence

Cite this

Ben Miled, Z., Haas, K., Black, C. M., Khandker, R. K., Chandrasekaran, V., Lipton, R., & Boustani, M. A. (2020). Predicting dementia with routine care EMR data. Artificial Intelligence in Medicine, 102, [101771]. https://doi.org/10.1016/j.artmed.2019.101771

Predicting dementia with routine care EMR data. / Ben Miled, Zina; Haas, Kyle; Black, Christopher M.; Khandker, Rezaul Karim; Chandrasekaran, Vasu; Lipton, Richard; Boustani, Malaz A.

In: Artificial Intelligence in Medicine, Vol. 102, 101771, 01.2020.

Research output: Contribution to journalArticle

Ben Miled, Z, Haas, K, Black, CM, Khandker, RK, Chandrasekaran, V, Lipton, R & Boustani, MA 2020, 'Predicting dementia with routine care EMR data', Artificial Intelligence in Medicine, vol. 102, 101771. https://doi.org/10.1016/j.artmed.2019.101771
Ben Miled Z, Haas K, Black CM, Khandker RK, Chandrasekaran V, Lipton R et al. Predicting dementia with routine care EMR data. Artificial Intelligence in Medicine. 2020 Jan;102. 101771. https://doi.org/10.1016/j.artmed.2019.101771
Ben Miled, Zina ; Haas, Kyle ; Black, Christopher M. ; Khandker, Rezaul Karim ; Chandrasekaran, Vasu ; Lipton, Richard ; Boustani, Malaz A. / Predicting dementia with routine care EMR data. In: Artificial Intelligence in Medicine. 2020 ; Vol. 102.
@article{4d51ba4a9b9547279a43e40b344478fe,
title = "Predicting dementia with routine care EMR data",
abstract = "Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80{\%} despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.",
keywords = "Dementia, EMR, Machine learning, Prediction, Random forest",
author = "{Ben Miled}, Zina and Kyle Haas and Black, {Christopher M.} and Khandker, {Rezaul Karim} and Vasu Chandrasekaran and Richard Lipton and Boustani, {Malaz A.}",
year = "2020",
month = "1",
doi = "10.1016/j.artmed.2019.101771",
language = "English (US)",
volume = "102",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",

}

TY - JOUR

T1 - Predicting dementia with routine care EMR data

AU - Ben Miled, Zina

AU - Haas, Kyle

AU - Black, Christopher M.

AU - Khandker, Rezaul Karim

AU - Chandrasekaran, Vasu

AU - Lipton, Richard

AU - Boustani, Malaz A.

PY - 2020/1

Y1 - 2020/1

N2 - Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80% despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.

AB - Our aim is to develop a machine learning (ML) model that can predict dementia in a general patient population from multiple health care institutions one year and three years prior to the onset of the disease without any additional monitoring or screening. The purpose of the model is to automate the cost-effective, non-invasive, digital pre-screening of patients at risk for dementia. Towards this purpose, routine care data, which is widely available through Electronic Medical Record (EMR) systems is used as a data source. These data embody a rich knowledge and make related medical applications easy to deploy at scale in a cost-effective manner. Specifically, the model is trained by using structured and unstructured data from three EMR data sets: diagnosis, prescriptions, and medical notes. Each of these three data sets is used to construct an individual model along with a combined model which is derived by using all three data sets. Human-interpretable data processing and ML techniques are selected in order to facilitate adoption of the proposed model by health care providers from multiple institutions. The results show that the combined model is generalizable across multiple institutions and is able to predict dementia within one year of its onset with an accuracy of nearly 80% despite the fact that it was trained using routine care data. Moreover, the analysis of the models identified important predictors for dementia. Some of these predictors (e.g., age and hypertensive disorders) are already confirmed by the literature while others, especially the ones derived from the unstructured medical notes, require further clinical analysis.

KW - Dementia

KW - EMR

KW - Machine learning

KW - Prediction

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=85076191922&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076191922&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2019.101771

DO - 10.1016/j.artmed.2019.101771

M3 - Article

AN - SCOPUS:85076191922

VL - 102

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

M1 - 101771

ER -