Options
A systematic review of differential item functioning in second language assessment
Loading...
Type
Article
Citation
Chen, X., Aryadoust, V., & Zhang, W. (2024). A systematic review of differential item functioning in second language assessment. Language Testing. Advance online publication. https://doi.org/10.1177/02655322241290188
Abstract
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic review. The findings suggested that classical DIF techniques were dominant in use, particularly Rasch-based methods, the Mantel–Haenszel procedure, item response theory (IRT) approaches, logistic regression, and SIBTEST, but emerging methods such as DIF analysis based on cognitive diagnostic models were also identified. Most DIF studies examined manifest grouping variables such as gender and language background and were based on assessments of receptive language skills such as reading and listening comprehension. DIF analyses were mostly conducted in an exploratory fashion and causes of DIF were often justified on speculative rather than empirical grounds. In addition, the quality of DIF analyses was undermined by suboptimal reporting practices. Our results suggest the need to improve current DIF practices, to consider alternative DIF detection methods aligning with emerging views of measurement bias, and to adequately account for the heterogeneity of L2 test takers. The findings have implications for test design and use, fairness, and validity in L2 assessments.
Date Issued
2024
Publisher
Sage
Journal
Language Testing