Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices. Using two large data sets, the authors demonstrate the usefulness of these effect sizes for judging practical importance: the LR adjusted odds ratio and its conversions to the delta metric, the Educational Testing Service (ETS) classification system, and the p metric; the LR model-based standardization indices, using various weights for averaging stratum-specific differences in fitted probabilities; and a p metric classification system. Pros and cons of these effect sizes are discussed. Recommendations are offered. These LR effect sizes will be valuable to practitioners, particularly for preventing flagging of statistically significant hut practically unimportant DIF in large samples.
- Differential item functioning
- Effect sizes
- Logistic regression
ASJC Scopus subject areas
- Social Sciences (miscellaneous)