Analyzing Script Concordance Test Scoring Methods and Items by Difficulty and Type

Adam B. Wilson, Gary R. Pike, Aloysius Humbert

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: A battery of various psychometric assessments has been conducted on script concordance tests (SCTs) that are purported to measure data interpretation, an essential component of clinical reasoning. Although the breadth of published SCT research is broad, best practice controversies and evidentiary gaps remain. Purposes: In this study, SCT data were used to test the psychometric properties of 6 scoring methods. In addition, this study explored whether SCT items clustered by difficulty and type were able to discriminate between medical training levels. Methods: SCT scores from a problem-solving SCT (SCT-PS; n = 522) and emergency medicine SCT (SCT-EM; n = 1,040) were collected at a large institution of medicine. Item analyses were performed to optimize each dataset. Items were categorized into difficulty levels and organized into types. Correlational analyses, one-way multivariate analysis of variance (MANOVA), repeated measures analysis of variance (ANOVA), and one-way ANOVA were conducted to explore study aims. Results: All 6 scoring methods differentiated between training levels. Longitudinal analysis of SCT-PS data reported that MS4s significantly (p <.001) outperformed their scores as MS2s in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p <.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. Items categorized by type were also able to detect training level disparities. Conclusions: Of the 6 scoring methods, 5-point scoring solutions generated more reliable measures of data interpretation than 3-point scoring methods. Data interpretation abilities were a function of experience at every level of item difficulty. Items categorized by type exhibited discriminatory power providing modest evidence toward the construct validity of SCTs.

Original languageEnglish (US)
Pages (from-to)135-145
Number of pages11
JournalTeaching and Learning in Medicine
Volume26
Issue number2
DOIs
StatePublished - Apr 2014

Fingerprint

Research Design
Analysis of Variance
Psychometrics
Aptitude
Emergency Medicine
Practice Guidelines
analysis of variance
Multivariate Analysis
Cross-Sectional Studies
Medicine
Physicians
psychometrics
interpretation
Research
medicine
construct validity
multivariate analysis
best practice
physician
resident

Keywords

  • clinical reasoning
  • psychometrics
  • script concordance tests

ASJC Scopus subject areas

  • Medicine(all)
  • Education

Cite this

Analyzing Script Concordance Test Scoring Methods and Items by Difficulty and Type. / Wilson, Adam B.; Pike, Gary R.; Humbert, Aloysius.

In: Teaching and Learning in Medicine, Vol. 26, No. 2, 04.2014, p. 135-145.

Research output: Contribution to journalArticle

@article{eea7fb9cd755494ba51dc31f9ae2ba1a,
title = "Analyzing Script Concordance Test Scoring Methods and Items by Difficulty and Type",
abstract = "Background: A battery of various psychometric assessments has been conducted on script concordance tests (SCTs) that are purported to measure data interpretation, an essential component of clinical reasoning. Although the breadth of published SCT research is broad, best practice controversies and evidentiary gaps remain. Purposes: In this study, SCT data were used to test the psychometric properties of 6 scoring methods. In addition, this study explored whether SCT items clustered by difficulty and type were able to discriminate between medical training levels. Methods: SCT scores from a problem-solving SCT (SCT-PS; n = 522) and emergency medicine SCT (SCT-EM; n = 1,040) were collected at a large institution of medicine. Item analyses were performed to optimize each dataset. Items were categorized into difficulty levels and organized into types. Correlational analyses, one-way multivariate analysis of variance (MANOVA), repeated measures analysis of variance (ANOVA), and one-way ANOVA were conducted to explore study aims. Results: All 6 scoring methods differentiated between training levels. Longitudinal analysis of SCT-PS data reported that MS4s significantly (p <.001) outperformed their scores as MS2s in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p <.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. Items categorized by type were also able to detect training level disparities. Conclusions: Of the 6 scoring methods, 5-point scoring solutions generated more reliable measures of data interpretation than 3-point scoring methods. Data interpretation abilities were a function of experience at every level of item difficulty. Items categorized by type exhibited discriminatory power providing modest evidence toward the construct validity of SCTs.",
keywords = "clinical reasoning, psychometrics, script concordance tests",
author = "Wilson, {Adam B.} and Pike, {Gary R.} and Aloysius Humbert",
year = "2014",
month = "4",
doi = "10.1080/10401334.2014.884464",
language = "English (US)",
volume = "26",
pages = "135--145",
journal = "Teaching and Learning in Medicine",
issn = "1040-1334",
publisher = "Routledge",
number = "2",

}

TY - JOUR

T1 - Analyzing Script Concordance Test Scoring Methods and Items by Difficulty and Type

AU - Wilson, Adam B.

AU - Pike, Gary R.

AU - Humbert, Aloysius

PY - 2014/4

Y1 - 2014/4

N2 - Background: A battery of various psychometric assessments has been conducted on script concordance tests (SCTs) that are purported to measure data interpretation, an essential component of clinical reasoning. Although the breadth of published SCT research is broad, best practice controversies and evidentiary gaps remain. Purposes: In this study, SCT data were used to test the psychometric properties of 6 scoring methods. In addition, this study explored whether SCT items clustered by difficulty and type were able to discriminate between medical training levels. Methods: SCT scores from a problem-solving SCT (SCT-PS; n = 522) and emergency medicine SCT (SCT-EM; n = 1,040) were collected at a large institution of medicine. Item analyses were performed to optimize each dataset. Items were categorized into difficulty levels and organized into types. Correlational analyses, one-way multivariate analysis of variance (MANOVA), repeated measures analysis of variance (ANOVA), and one-way ANOVA were conducted to explore study aims. Results: All 6 scoring methods differentiated between training levels. Longitudinal analysis of SCT-PS data reported that MS4s significantly (p <.001) outperformed their scores as MS2s in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p <.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. Items categorized by type were also able to detect training level disparities. Conclusions: Of the 6 scoring methods, 5-point scoring solutions generated more reliable measures of data interpretation than 3-point scoring methods. Data interpretation abilities were a function of experience at every level of item difficulty. Items categorized by type exhibited discriminatory power providing modest evidence toward the construct validity of SCTs.

AB - Background: A battery of various psychometric assessments has been conducted on script concordance tests (SCTs) that are purported to measure data interpretation, an essential component of clinical reasoning. Although the breadth of published SCT research is broad, best practice controversies and evidentiary gaps remain. Purposes: In this study, SCT data were used to test the psychometric properties of 6 scoring methods. In addition, this study explored whether SCT items clustered by difficulty and type were able to discriminate between medical training levels. Methods: SCT scores from a problem-solving SCT (SCT-PS; n = 522) and emergency medicine SCT (SCT-EM; n = 1,040) were collected at a large institution of medicine. Item analyses were performed to optimize each dataset. Items were categorized into difficulty levels and organized into types. Correlational analyses, one-way multivariate analysis of variance (MANOVA), repeated measures analysis of variance (ANOVA), and one-way ANOVA were conducted to explore study aims. Results: All 6 scoring methods differentiated between training levels. Longitudinal analysis of SCT-PS data reported that MS4s significantly (p <.001) outperformed their scores as MS2s in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p <.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. Items categorized by type were also able to detect training level disparities. Conclusions: Of the 6 scoring methods, 5-point scoring solutions generated more reliable measures of data interpretation than 3-point scoring methods. Data interpretation abilities were a function of experience at every level of item difficulty. Items categorized by type exhibited discriminatory power providing modest evidence toward the construct validity of SCTs.

KW - clinical reasoning

KW - psychometrics

KW - script concordance tests

UR - http://www.scopus.com/inward/record.url?scp=84898766921&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898766921&partnerID=8YFLogxK

U2 - 10.1080/10401334.2014.884464

DO - 10.1080/10401334.2014.884464

M3 - Article

VL - 26

SP - 135

EP - 145

JO - Teaching and Learning in Medicine

JF - Teaching and Learning in Medicine

SN - 1040-1334

IS - 2

ER -