Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel chi-square test for differential item functioning

Patrick Monahan, Robert D. Ankenmann

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.

Original languageEnglish
Pages (from-to)101-131
Number of pages31
JournalJournal of Educational Measurement
Volume42
Issue number2
DOIs
StatePublished - Jun 2005

Fingerprint

Economic Inflation
Chi-Square Distribution
Sample Size
inflation
distribution theory
model theory
logistics
Group

ASJC Scopus subject areas

  • Psychology(all)
  • Applied Psychology
  • Developmental and Educational Psychology
  • Psychology (miscellaneous)

Cite this

@article{d86376258dce43ddb726d3613695e5f6,
title = "Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel chi-square test for differential item functioning",
abstract = "Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.",
author = "Patrick Monahan and Ankenmann, {Robert D.}",
year = "2005",
month = "6",
doi = "10.1111/j.1745-3984.2005.00006",
language = "English",
volume = "42",
pages = "101--131",
journal = "Journal of Educational Measurement",
issn = "0022-0655",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel chi-square test for differential item functioning

AU - Monahan, Patrick

AU - Ankenmann, Robert D.

PY - 2005/6

Y1 - 2005/6

N2 - Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.

AB - Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.

UR - http://www.scopus.com/inward/record.url?scp=17144419380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17144419380&partnerID=8YFLogxK

U2 - 10.1111/j.1745-3984.2005.00006

DO - 10.1111/j.1745-3984.2005.00006

M3 - Article

AN - SCOPUS:17144419380

VL - 42

SP - 101

EP - 131

JO - Journal of Educational Measurement

JF - Journal of Educational Measurement

SN - 0022-0655

IS - 2

ER -