Options
A meta-analysis of the reliability of L2 reading comprehension assessments
Score reliability is one of the major facets of modern validity frameworks in language assessment. Specifically, within the argument-based validation of assessments, reliability functions as indispensable evidence in the cause-effect dynamic of generalization and higher-level validity inferences. The present study aims to determine the average reliability of L2 reading tests, identify the potential moderators of reliability in L2 reading comprehension tests and explore the potential power of reliability in predicting the relationship between generalization and explanation inferences.
A reliability generalization (RG) meta-analysis was conducted to compute the average reliability coefficient of L2 reading comprehension tests and identify the potential predictor variables that moderate reliability. I examined 1883 individual studies from Scopus, the Web of Science, ERIC, and LLBA databases for possible inclusion and assessed 266 studies as eligible for the inclusion criteria. Out of these, I extracted 85 Cronbach’s alpha estimates from 60 studies (years 2002-2023) that reported Cronbach’s alpha estimates properly and coded 28 potential predictors comprising of the characteristics of the study, the test, and test-takers. A linear mixed-effects model (LMEM) analysis was subsequently conducted to test for the predictive power of reliability coefficient in the relationship between generalization and explanation inferences. I further examined the impact of Cronbach’s alpha coefficient on the correlation between L2 reading comprehension tests and various language proficiency measures. This involved the reliability estimates of reading comprehension tests from 24 studies and 189 correlation data points between the reading comprehension tests and measures of language proficiency categorized into 11 groups.
The RG meta-analysis found an average reliability of 0.78 (95% CI [0.76, 0.80]) with 40% of Cronbach’s coefficients falling below the lower bound of the confidence interval. The results of a heterogeneity test of Cronbach’s alphas indicated significant heterogeneity across studies (I2 = 97.58%) with variances partitioned into sampling error (2.42%), within-study (24.64%) and between-study (72.95%) differences. The number of test items and test-takers’ L1 were found to explain 19.65% and 13.70% of variation in the reliability coefficients across the studies, respectively. The LMEM analysis showed that alpha coefficients do not predict the correlation between reading comprehension tests and other measures of language proficiency. The implications of this study, its limitations and future studies are further discussed.