Options
Investigating content-based subscore structure of science tests
Author
Fong, Yick Chee
Supervisor
Chew, Lee Chin
Abstract
There is growing worldwide interest in test subscores and their reporting which provides diagnostic information on students’ learning performance and achievement for various stakeholders such as educational researchers, school administrators, classroom teachers as well as parents and students. Testing agencies can easily report test subscores without doing additional testing by analysing data from administered tests. Yet in Singapore, results of high-stakes assessments are typically obtained based on total test scores without subscores information. For instance, PSLE Science and GCE O-level Science are not reported with test subscore information even though the subjects have content areas akin to distinct science disciplinary content at higher (tertiary or professional) levels. Generating subscores from test data is apparently a simple process requiring a summing up of scores on test items associated with a content area. However, there is no guarantee that such test subscores information will be trustworthy and meaningful. Test subscores must have good psychometric qualities, i.e. they must be valid and reliable, before they are justifiable and beneficial to report.
The purpose of this study was to investigate whether a content-based subscore structure is present in primary and secondary level Science tests conducted in Singapore, and if present, to explore the potentials of test subscore reporting in the Singapore context. The data source, TIMSS 2015 Science tests based on Singapore samples only, was deemed appropriate for this investigation. Singapore Science curriculum for primary and lower secondary levels aligns well with TIMSS Grade 4 and Grade 8 Science assessment framework. TIMSS Grade 4 Science includes the content areas of life science, physical science, and earth science; and Grade 8 Science includes biology, chemistry, physics, and earth science. A set of 14 Science tests was used for data collection for each grade level. The two datasets, one comprising a total of 6517 students for Grade 4 and the other a total of 6116 students for Grade 8, were subjected to analyses. The validity of the content-based subscores was evaluated using parallel analysis (Horn, 1965), and their reliability evaluated using three methods based on Classical Test Theory: Feinberg’s (2012) value added ratio (VAR), Haberman’s (2008a) proportional reduction of mean squared errors (PRMSE), and Brennan’s (2012) utility index.
Results from the parallel analysis showed uni-dimensionality in each Science test of both grade-levels and thus no support for a content-based subscore structure. Factoring in the results from the three methods of reliability evaluation, there seemed to be no value-addedness in reporting subscores for primary and secondary Science. There were some interesting insights perceived as to the methods used for evaluating reliabilities. For instance, Haberman’s method evaluated value-addedness of weighted averages, a special type of augmented subscores, but the results showed little promise as only 10 out of 98 augmented subscores showed significant value-addedness. Brennan’s method estimated the increase in subtest length to facilitate reliable subscore reporting, but this method was shown to be limited as it gave unrealistically high estimates when the subscores showed initial poor reliabilities (Cronbach alpha of 0.4 or less). In light of there being no evidence of a content-based subscore structure deployed to support test subscore reporting for both primary and secondary Science, implications of this deficiency for Science teaching and testing in the Singapore context are discussed. The limitations of the research are also explained and suggestions for future research indicated.
The purpose of this study was to investigate whether a content-based subscore structure is present in primary and secondary level Science tests conducted in Singapore, and if present, to explore the potentials of test subscore reporting in the Singapore context. The data source, TIMSS 2015 Science tests based on Singapore samples only, was deemed appropriate for this investigation. Singapore Science curriculum for primary and lower secondary levels aligns well with TIMSS Grade 4 and Grade 8 Science assessment framework. TIMSS Grade 4 Science includes the content areas of life science, physical science, and earth science; and Grade 8 Science includes biology, chemistry, physics, and earth science. A set of 14 Science tests was used for data collection for each grade level. The two datasets, one comprising a total of 6517 students for Grade 4 and the other a total of 6116 students for Grade 8, were subjected to analyses. The validity of the content-based subscores was evaluated using parallel analysis (Horn, 1965), and their reliability evaluated using three methods based on Classical Test Theory: Feinberg’s (2012) value added ratio (VAR), Haberman’s (2008a) proportional reduction of mean squared errors (PRMSE), and Brennan’s (2012) utility index.
Results from the parallel analysis showed uni-dimensionality in each Science test of both grade-levels and thus no support for a content-based subscore structure. Factoring in the results from the three methods of reliability evaluation, there seemed to be no value-addedness in reporting subscores for primary and secondary Science. There were some interesting insights perceived as to the methods used for evaluating reliabilities. For instance, Haberman’s method evaluated value-addedness of weighted averages, a special type of augmented subscores, but the results showed little promise as only 10 out of 98 augmented subscores showed significant value-addedness. Brennan’s method estimated the increase in subtest length to facilitate reliable subscore reporting, but this method was shown to be limited as it gave unrealistically high estimates when the subscores showed initial poor reliabilities (Cronbach alpha of 0.4 or less). In light of there being no evidence of a content-based subscore structure deployed to support test subscore reporting for both primary and secondary Science, implications of this deficiency for Science teaching and testing in the Singapore context are discussed. The limitations of the research are also explained and suggestions for future research indicated.
Date Issued
2018
Call Number
LB3060.77 Fon
Date Submitted
2018