Options
An investigation of differential item functioning in the computerized PTE academic reading test : searching for fairness evidence
Author
Zhu, Xuelian
Supervisor
Aryadoust, Vahid
Abstract
The Pearson Test of English (PTE) Academic is a high-stakes computer- assisted language assessment which has been developed and operationalized fairly recently. Despite the importance of the test, empirical research on PTE’s test fairness has been scant, likely due to its recent introduction to the assessment arena and its unique approach to language assessment. To gather evidence of test fairness the, the present study investigated whether the PTE Academic Reading is a fair measure of the English reading proficiency of international test takers with different backgrounds. The study concentrates on differential item functioning (DIF) across two factors: gender (male vs. female) and mother tongues (Indo- European, Dravidian, and Sino-Tibetan).
Test data from 783 test takers were provided by Pearson Education. The data were analyzed using the partial credit model. The unidimensionality and local independence of the test were confirmed, and the uniformed differential item functioning (UDIF) and the non-uniformed different item functioning (NUDIF) were examined. The results showed four pair of statistically significant NUDIF (p < 0.05) across genders, seven pairs of significant UDIF, and fifteen pairs of significant DIF (p < 0.05) across the Indo-European, Dravidian, and Sino- Tibetan subgroups or readers, indicating that the test fairness of PTE Academic Reading test could be affected and thus caution must be exercised in interpreting the test scores. On the other hand, it is also argued that, if DIF in favor of the counterpart subgroups is present in other subtests (e.g., listening), the DIF items can cancel each other out, thus resulting in the nullifying of the effect of DIF on the fairness of the test.
Theoretical and practical implications are derived from this study for both test developers and the community of language assessment, and recommendations for incorporating other social and contextual factors are proposed.
Test data from 783 test takers were provided by Pearson Education. The data were analyzed using the partial credit model. The unidimensionality and local independence of the test were confirmed, and the uniformed differential item functioning (UDIF) and the non-uniformed different item functioning (NUDIF) were examined. The results showed four pair of statistically significant NUDIF (p < 0.05) across genders, seven pairs of significant UDIF, and fifteen pairs of significant DIF (p < 0.05) across the Indo-European, Dravidian, and Sino- Tibetan subgroups or readers, indicating that the test fairness of PTE Academic Reading test could be affected and thus caution must be exercised in interpreting the test scores. On the other hand, it is also argued that, if DIF in favor of the counterpart subgroups is present in other subtests (e.g., listening), the DIF items can cancel each other out, thus resulting in the nullifying of the effect of DIF on the fairness of the test.
Theoretical and practical implications are derived from this study for both test developers and the community of language assessment, and recommendations for incorporating other social and contextual factors are proposed.
Date Issued
2018
Call Number
PE1128.3 Zhu