Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity

J. Patel, Z. Siddiqui, A. Krishnan, Thankam Paul Thyvalikakath

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing. Objective To determine patients' detailed smoking status based on smoking intensity from the EDR. Methods First, the authors created a reference standard of 3,296 unique patients' smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients' smoking histories. Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients' detailed smoking information. Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.

Original languageEnglish (US)
Pages (from-to)253-260
Number of pages8
JournalMethods of Information in Medicine
Volume57
Issue number5-6
DOIs
StatePublished - Jan 1 2018

Fingerprint

Dental Records
Smoking
Electronic Health Records
Mouth Diseases
Data Mining

Keywords

  • dental informatics
  • electronic dental record
  • electronic health record
  • information extraction
  • machine learning classifiers
  • smoking intensity

ASJC Scopus subject areas

  • Health Informatics
  • Advanced and Specialized Nursing
  • Health Information Management

Cite this

Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity. / Patel, J.; Siddiqui, Z.; Krishnan, A.; Thyvalikakath, Thankam Paul.

In: Methods of Information in Medicine, Vol. 57, No. 5-6, 01.01.2018, p. 253-260.

Research output: Contribution to journalArticle

@article{2d977de7a5bd45f8acbf083089eefa0f,
title = "Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity",
abstract = "Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing. Objective To determine patients' detailed smoking status based on smoking intensity from the EDR. Methods First, the authors created a reference standard of 3,296 unique patients' smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and na{\"i}ve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients' smoking histories. Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98{\%}); intermittent smoker (P: 95{\%}, R: 98{\%}, F: 96{\%}); past smoker (P, R, F: 89{\%}); light smoker (P, R, F: 87{\%}); smokers with unknown intensity (P: 76{\%}, R: 86{\%}, F: 81{\%}), and intermediate smoker (P: 90{\%}, R: 88{\%}, F: 89{\%}). It performed moderately to differentiate heavy smokers (P: 90{\%}, R: 44{\%}, F: 60{\%}). EDR could be a valuable source for obtaining patients' detailed smoking information. Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.",
keywords = "dental informatics, electronic dental record, electronic health record, information extraction, machine learning classifiers, smoking intensity",
author = "J. Patel and Z. Siddiqui and A. Krishnan and Thyvalikakath, {Thankam Paul}",
year = "2018",
month = "1",
day = "1",
doi = "10.1055/s-0039-1681088",
language = "English (US)",
volume = "57",
pages = "253--260",
journal = "Methods of Information in Medicine",
issn = "0026-1270",
publisher = "Schattauer GmbH",
number = "5-6",

}

TY - JOUR

T1 - Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity

AU - Patel, J.

AU - Siddiqui, Z.

AU - Krishnan, A.

AU - Thyvalikakath, Thankam Paul

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing. Objective To determine patients' detailed smoking status based on smoking intensity from the EDR. Methods First, the authors created a reference standard of 3,296 unique patients' smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients' smoking histories. Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients' detailed smoking information. Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.

AB - Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing. Objective To determine patients' detailed smoking status based on smoking intensity from the EDR. Methods First, the authors created a reference standard of 3,296 unique patients' smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients' smoking histories. Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients' detailed smoking information. Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.

KW - dental informatics

KW - electronic dental record

KW - electronic health record

KW - information extraction

KW - machine learning classifiers

KW - smoking intensity

UR - http://www.scopus.com/inward/record.url?scp=85062985660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062985660&partnerID=8YFLogxK

U2 - 10.1055/s-0039-1681088

DO - 10.1055/s-0039-1681088

M3 - Article

C2 - 30875704

AN - SCOPUS:85062985660

VL - 57

SP - 253

EP - 260

JO - Methods of Information in Medicine

JF - Methods of Information in Medicine

SN - 0026-1270

IS - 5-6

ER -