Successful retrieval of a corpus of literature on a broad topic can be difficult. This study demonstrates a method to retrieve the dental and craniofacial research literature. We explored MeSH manually for dental or craniofacial indexing terms. MEDLINE was searched using these terms, and a random sample of references was extracted from the resulting set. Sixteen dental research experts categorized these articles, reading only the title and abstract, as either: (1) dental research, (2) dental non-research, (3) non-dental, or (4) not sure. Identify Patient Sets (IPS), a probabilistic text classifier, created models, based on the presence or absence of words or UMLS phrases, that distinguished dental research articles from all others. These models were applied to a test set with different inputs for each article: (1) title and abstract only, (2) MeSH terms only, or (3) both. By title and abstract only, IPS correctly classified 64% of all dental research articles present in the test set. The percentage of correctly classified dental research articles in this retrieved set was 71%. MeSH term inclusion decreased performance. Computer programs that use text input to categorize articles may aid in retrieval of a broad corpus of literature better than indexing terms or key words alone.
ASJC Scopus subject areas