Alternative matching scores to control type I error of the Mantel-Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models

Patrick Monahan, Robert D. Ankenmann

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item- total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley's linear regressed true score estimate.

Original languageEnglish
Pages (from-to)193-210
Number of pages18
JournalApplied Psychological Measurement
Volume34
Issue number3
DOIs
StatePublished - 2010

Fingerprint

Weights and Measures
Economic Inflation
Statistical Models
Chi-Square Distribution
Sample Size
Statistical Factor Analysis
model theory
statistics
inflation
factor analysis
logistics
simulation
trend

Keywords

  • Beta binomial models
  • Classical test theory
  • Differential item functioning
  • Item response theory,
  • Mantel-Haenszel procedure
  • Monte Carlo simulation
  • Type I error

ASJC Scopus subject areas

  • Psychology (miscellaneous)
  • Social Sciences (miscellaneous)

Cite this

@article{7bd239649e76412d85026e56611caa5e,
title = "Alternative matching scores to control type I error of the Mantel-Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models",
abstract = "When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item- total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley's linear regressed true score estimate.",
keywords = "Beta binomial models, Classical test theory, Differential item functioning, Item response theory,, Mantel-Haenszel procedure, Monte Carlo simulation, Type I error",
author = "Patrick Monahan and Ankenmann, {Robert D.}",
year = "2010",
doi = "10.1177/0146621609359283",
language = "English",
volume = "34",
pages = "193--210",
journal = "Applied Psychological Measurement",
issn = "0146-6216",
publisher = "SAGE Publications Inc.",
number = "3",

}

TY - JOUR

T1 - Alternative matching scores to control type I error of the Mantel-Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models

AU - Monahan, Patrick

AU - Ankenmann, Robert D.

PY - 2010

Y1 - 2010

N2 - When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item- total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley's linear regressed true score estimate.

AB - When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item score, but primarily on short tests. Alternative matching scores were developed based on sufficient statistics, reliability, and explicit corrections for measurement error. Manipulated factors were tests (20, 24, 26 items), reference/focal sample sizes (1,000/1,000, 800/200), proficiency distributions (identical, means differed, variances differed, means and variances differed), and simulation technique (three-parameter logistic IRT model and four-parameter beta compound-binomial model with nonparametric nonmonotonic item-true score step functions). Outcomes were as follows: TIE of MH chi-square test at the .05 nominal level; and the bias, standard error, and root mean square error of the MH delta-DIF statistic under null-DIF conditions. Of eight categorized alternative matching scores, four scores controlled TIE as well as or better than traditional summed-item score in almost all items for all conditions: (a) estimated latent proficiency from a 3PL IRT model, (b) the sum of weighted item scores where the weight was the item- total score biserial correlation coefficient excluding the item from total score, (c) the sum of weighted item scores where the weight was the item loading on the single common factor from factor analysis of tetrachoric correlation coefficients, and (d) Kelley's linear regressed true score estimate.

KW - Beta binomial models

KW - Classical test theory

KW - Differential item functioning

KW - Item response theory,

KW - Mantel-Haenszel procedure

KW - Monte Carlo simulation

KW - Type I error

UR - http://www.scopus.com/inward/record.url?scp=77952006713&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952006713&partnerID=8YFLogxK

U2 - 10.1177/0146621609359283

DO - 10.1177/0146621609359283

M3 - Article

AN - SCOPUS:77952006713

VL - 34

SP - 193

EP - 210

JO - Applied Psychological Measurement

JF - Applied Psychological Measurement

SN - 0146-6216

IS - 3

ER -