Options
Item response theory : item calibration and sample size
Author
Cheong, Ka Myn
Supervisor
Cheng, Yuan Shan
Abstract
The sample size suitable for the calibration of stable item parameters is an issue commonly encountered by test developers. Item parameters that are not stable or vary from sample to sample affect the reliability and validity of the test. Recommendations from previous studies on suitable sample size were specific but not consistent. The main objective of this study was to identify a sample size suitable for calibrating item parameters of a 70-item, multiple-choice, English achievement test. BILOG-MG, a statistical package based on Item Response Theory (IRT) models, was used to perform item calibration and differential item functioning (DIF) analyses in this study. A suitable sample size of 1000 was found to achieve stable item parameters for the 70-item test based on a 3-parameter logistic (3PL) model. One of the secondary objectives in this study was to examine whether the number of item parameters calibrated would influence the suitable sample size for calibrating the 70-item test. Results showed that there is no difference as the suitable sample size identified by a 1-parameter logistic (1PL) model is the same as that identified by the 3PL model. The last goal of this study was to examine whether the number of items in a test would influence the suitable sample size. Results showed that a smaller sample size of 500 was suitable to calibrate a 27-item test. Additional analyses were then performed to evaluate whether the percentage of correct responses to items with DIF and the samples’ frequency distribution of total scores had an influence on DIF results. The results were mixed and inconclusive. Together, these findings suggest that sample sizes of at least 1000 and 500 are suitable for calibrating a 70-item test and a 25-item test respectively. Implications of and details of the results were discussed in this study.
Date Issued
2012
Call Number
BF39.2.I84 Che
Date Submitted
2012