Options
Building a validity argument for a listening test of academic proficiency
Abstract
The purpose of the present study is to build a validity argument for a listening test of academic proficiency. Various models of validity have evolved throughout the history of psychological and educational assessment, which can be classified into two major classes: traditional models including several kinds of validity (for example, content, construct, and predictive) (Cronbach, 1969; Cronbach & Meehl, 1955) and modern models conceptualizing validity as a unitary construct which is evaluated through an argumentation process (Messick, 1989; Kane, 2002, 2004, 2006). I have adapted the recent concept of validity in the present study due to the advantages that it confers over the traditional model: it characterizes validity as a unitary concept; it has a clear start and end point; and it involves both supporting and rebutting evidence of validity.
It is argued that IELTS listening test is a while-listening-performance (abbreviated here as WLP) test; test takers read and answer test items while they listen to oral stimuli, and thus engage in the following simultaneous activities: a) read test items; b) listen to the oral text; c) write or choose the answer; and d) follow the oral text to move to the next test item. The simultaneity of test performance raises a number of theoretical questions. For example, previous IELTS listening studies indicate: a) a role for guessing in test items that invite word-level comprehension, despite the conventional wisdom that guessing is a confounding factor only in multiple choice questions (MCQs) and that failure to answer an item correctly does not necessarily imply a lack of comprehension; b) differential item functioning (DIF) in multiple test items; c) that the test items of the IELTS listening module primarily evaluate test takers’ understanding of details and explicitly stated information and making paraphrases—the ability to make inferences, interpret illocutionary meaning, and draw conclusions are not tested; and d) weak association with external measurement criteria of listening comprehension.
To address these issues, the study proposes five main research questions addressing meaningfulness of scores and bias of the test. Specifically, the questions are concerned with the listening sub-skills that the test taps, dimensionality of the test, variables that predict item difficulty parameters, bias across age, nationality, previous exposure to the test, and gender, as well as predictive-referenced evidence of validity. Then, it reviews relevant literature for each research question in Chapters One and Two and connects each question to the corresponding
validity inferences along with their underlying warrants, assumptions, and backings.
It was found that: a) the cognitive sub-skills that are measured by the test are critically narrow, thereby under-representing the listening construct; b) both construct-relevant and - irrelevant factors can predict item difficulty; c) construct irrelevant factors seem to have contaminated the test structure and scores; d) differential item functioning seem to have affected performance on many, if not most, test items; and e) test scores correlate moderately with the ETS listening test, and weakly with a self-assessment listening questionnaire developed in the study. By using findings from investigation into each research question as backings for one or more validity inferences, the study builds a validity argument framework which organizes these thematically related findings into a coherent treatment of the validity of the IELTS listening test.
This validity argument is not well-supported and is attenuated in most cases by the findings of the research studies. Future research directions are discussed in the final chapter.
It is argued that IELTS listening test is a while-listening-performance (abbreviated here as WLP) test; test takers read and answer test items while they listen to oral stimuli, and thus engage in the following simultaneous activities: a) read test items; b) listen to the oral text; c) write or choose the answer; and d) follow the oral text to move to the next test item. The simultaneity of test performance raises a number of theoretical questions. For example, previous IELTS listening studies indicate: a) a role for guessing in test items that invite word-level comprehension, despite the conventional wisdom that guessing is a confounding factor only in multiple choice questions (MCQs) and that failure to answer an item correctly does not necessarily imply a lack of comprehension; b) differential item functioning (DIF) in multiple test items; c) that the test items of the IELTS listening module primarily evaluate test takers’ understanding of details and explicitly stated information and making paraphrases—the ability to make inferences, interpret illocutionary meaning, and draw conclusions are not tested; and d) weak association with external measurement criteria of listening comprehension.
To address these issues, the study proposes five main research questions addressing meaningfulness of scores and bias of the test. Specifically, the questions are concerned with the listening sub-skills that the test taps, dimensionality of the test, variables that predict item difficulty parameters, bias across age, nationality, previous exposure to the test, and gender, as well as predictive-referenced evidence of validity. Then, it reviews relevant literature for each research question in Chapters One and Two and connects each question to the corresponding
validity inferences along with their underlying warrants, assumptions, and backings.
It was found that: a) the cognitive sub-skills that are measured by the test are critically narrow, thereby under-representing the listening construct; b) both construct-relevant and - irrelevant factors can predict item difficulty; c) construct irrelevant factors seem to have contaminated the test structure and scores; d) differential item functioning seem to have affected performance on many, if not most, test items; and e) test scores correlate moderately with the ETS listening test, and weakly with a self-assessment listening questionnaire developed in the study. By using findings from investigation into each research question as backings for one or more validity inferences, the study builds a validity argument framework which organizes these thematically related findings into a coherent treatment of the validity of the IELTS listening test.
This validity argument is not well-supported and is attenuated in most cases by the findings of the research studies. Future research directions are discussed in the final chapter.
Date Issued
2012
Call Number
BF323.L5 Sey
Date Submitted
2012