Now showing 1 - 10 of 56
  • Publication
    Metadata only
    Investigating differential item functioning across interaction variables in listening comprehension assessment
    (Elsevier, 2024) ;
    Min, Shangchao
    ;
    Xueliang Chen

    Differential item functioning (DIF) analysis is essential to ensuring the equity of measurement for different subgroups at the item level and is an integral part of validity. However, existing DIF research often overlooks within-group heterogeneity, commonly assuming that test takers from different subgroups comprise a homogeneous population. This study investigated DIF across gender, academic background, and their interaction in listening comprehension assessment using Rasch measurement. It found that ignoring within-group heterogeneity would lead to the under-detection of DIF, likely due to the cancellation of DIF at broader group levels. In addition, the study is the first to investigate DIF in a linked test, a scenario more prevalent in practical testing. The findings of the study highlight the importance of accounting for within-group heterogeneity in test fairness investigations in language assessment research and point to the potential effect of test linking and equating on DIF analysis and interpretation.

  • Publication
    Unknown
    Developing and validating an academic listening questionnaire
    (Pabst Science, 2012) ; ;
    Lee, Ong Kim
    This article reports on the development and administration of the Academic Listening Self-rating Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components. The researchers operationalized the model, subjected items to iterative rounds of content analysis, and administered the finalized questionnaire to international ESL (English as a second language) students in Malaysian and Australian universities. Structural equation modeling and rating scale modeling of data provided content-related, substantive, and structural validity evidence for the instrument. The researchers explain the utility of the questionnaire for educational and assessment purposes.
      5
  • Publication
    Restricted
    Building a validity argument for a listening test of academic proficiency
    The purpose of the present study is to build a validity argument for a listening test of academic proficiency. Various models of validity have evolved throughout the history of psychological and educational assessment, which can be classified into two major classes: traditional models including several kinds of validity (for example, content, construct, and predictive) (Cronbach, 1969; Cronbach & Meehl, 1955) and modern models conceptualizing validity as a unitary construct which is evaluated through an argumentation process (Messick, 1989; Kane, 2002, 2004, 2006). I have adapted the recent concept of validity in the present study due to the advantages that it confers over the traditional model: it characterizes validity as a unitary concept; it has a clear start and end point; and it involves both supporting and rebutting evidence of validity.

    It is argued that IELTS listening test is a while-listening-performance (abbreviated here as WLP) test; test takers read and answer test items while they listen to oral stimuli, and thus engage in the following simultaneous activities: a) read test items; b) listen to the oral text; c) write or choose the answer; and d) follow the oral text to move to the next test item. The simultaneity of test performance raises a number of theoretical questions. For example, previous IELTS listening studies indicate: a) a role for guessing in test items that invite word-level comprehension, despite the conventional wisdom that guessing is a confounding factor only in multiple choice questions (MCQs) and that failure to answer an item correctly does not necessarily imply a lack of comprehension; b) differential item functioning (DIF) in multiple test items; c) that the test items of the IELTS listening module primarily evaluate test takers’ understanding of details and explicitly stated information and making paraphrases—the ability to make inferences, interpret illocutionary meaning, and draw conclusions are not tested; and d) weak association with external measurement criteria of listening comprehension.

    To address these issues, the study proposes five main research questions addressing meaningfulness of scores and bias of the test. Specifically, the questions are concerned with the listening sub-skills that the test taps, dimensionality of the test, variables that predict item difficulty parameters, bias across age, nationality, previous exposure to the test, and gender, as well as predictive-referenced evidence of validity. Then, it reviews relevant literature for each research question in Chapters One and Two and connects each question to the corresponding validity inferences along with their underlying warrants, assumptions, and backings.

    It was found that: a) the cognitive sub-skills that are measured by the test are critically narrow, thereby under-representing the listening construct; b) both construct-relevant and - irrelevant factors can predict item difficulty; c) construct irrelevant factors seem to have contaminated the test structure and scores; d) differential item functioning seem to have affected performance on many, if not most, test items; and e) test scores correlate moderately with the ETS listening test, and weakly with a self-assessment listening questionnaire developed in the study. By using findings from investigation into each research question as backings for one or more validity inferences, the study builds a validity argument framework which organizes these thematically related findings into a coherent treatment of the validity of the IELTS listening test.

    This validity argument is not well-supported and is attenuated in most cases by the findings of the research studies. Future research directions are discussed in the final chapter.
      350  146
  • Publication
    Open Access
    The role of mental imagery elaborations in listening comprehension: Application of forensic arts and cognitive interviews
    (Office of Education Research, National Institute of Education, Singapore, 2020)
    To investigate visual mental imageries generated during listening comprehension
      110  7
  • Publication
    Open Access
    A Rasch analysis of an international English language testing system listening sample test
    This study reports on an investigation of the construct validity of an International English Language Testing System (IELTS) Listening sample test. The test was administered to 148 multinational participants. The Rasch modeling of data was used to fulfill the research objectives. Four major conclusions were made: 1) the Rasch differential item functioning analysis revealed that limited production items behave differently across different test taker groups suggesting the presence of construct-irrelevancies, 2) multiple choice questions (MCQ) do not cause construct-irrelevancies unless testees need to make ‘close paraphrases’ to comprehend the item stem or the question demands more than one answer; this nominates short MCQ as a best item format in listening tests, 3) evidence was found for ‘lexical processing’ which is different from top-down/bottom-up processing, and 4) the Wright map provided evidence for construct under-representation of the test. Findings from this study provide different sorts of evidence supporting and disproving the claim of the construct validity of the test, although they should be further investigated in future studies with different samples. Implications of the findings for IELTS and item writers are also discussed.
      222  482
  • Publication
    Open Access
    Investigating the construct validity of the MELAB Listening Test through the Rasch analysis and correlated uniqueness modeling
    (University of Michigan, 2010) ;
    This article evaluates the construct validity of the Michigan English Language Assessment Battery (MELAB) listening test by investigating the underpinning structure of the test (or construct map), possible construct under representation and construct-irrelevant threats. Data for the study, from the administration of a form of the MELAB listening test to 916 international test takers, were provided by the English Language Institute of the University of Michigan. The researchers sought evidence of construct validity primarily through correlated uniqueness models (CUM) and the Rasch model. A five factor CUM was fitted into the data but did not display acceptable measurement properties. The researchers then evaluated a three-traits1 confirmatory factor analysis (CFA) that fitted the data sufficiently. This fitting model was further evaluated with parcel items, which supported the proposed CFA model. Accordingly, the underlying structure of the test was mapped out as three factors: ability to understand minimal context stimuli, short interactions, and long-stretch discourse. The researchers propose this model as the tentative construct map of this form of the test. To investigate construct under representation and construct-irrelevant threats, the Rasch model was used. This analysis showed that the test was relatively easy for the sample and the listening ability of several higher ability test takers were sufficiently tested by the items. This is interpreted to be a sign of test ceiling effects and minor construct-underrepresentation, although the researchers argue that the test is intended to distinguish among the students who have a minimum listening ability to enter a program from those who don’t. The Rasch model provided support of the absence of construct-irrelevant threats by showing the adherence of data to uni dimensionality and local independence, and good measurement properties of items. The final assessment of the observed results showed that the generated evidence supported the construct validity of the test.
      8  260
  • Publication
    Open Access
    A corpus study of language simplification and grammar in graded readers
    (2023)
    Azrifah Zakaria
    ;
    ;
    Studies on graded readers used in extensive reading have tended to focus on vocabulary. This study set out to investigate the linguistic profile of graded readers, taking into account both grammar and lexis. A corpus of 90 readers were tagged according to the variables in Biber’s Multidimensional (MD) analysis, using the Multidimensional Analysis Tagger (MAT). These variables were analysed using latent class cluster analysis to determine whether the graded readers can be grouped by similarity in linguistic features. While MAT analysis surfaced more similarities than differences within the corpus, latent class clustering produced an optimal 3-class model. Post-hoc concordance analyses showed that graded readers may be categorised as having three classes of complexity: beginner, transitional, and advanced. The findings in the study suggest that selection of reading materials for extensive reading should take into consideration grammatical complexity as well as lexis. The linguistic profiles compiled in this study detail the grammatical structures and the associated lexical items within the structures that teachers may expect their students to encounter when reading graded readers. In addition, the profiles may be of benefit to teachers seeking to supplement extensive reading with form-focused instruction.
      33  38
  • Publication
    Open Access
    A functional analysis of the dialogues in the new interchange intro textbook
    This study investigates language functions in the New Interchange Intro textbook. It is observed that more diffi cult grammar structures, e.g., wh-questions, have a gradual increase throughout the dialogues, but simple yes/no questions and statements are more pronounced in the opening lessons. Also, declarative sentences outnumber other grammar functions, and wh-questions rank second. Grammar structures establish three major macro-pragmatic functions: representative, directive, and expressive. However, commissive and declaration functions are not observed in the dialogues, which can be due to their diffi culty for new learners to communicate these language functions. Among micros, stating greetings, requesting, and expressing gratitude have the least frequency whereas exchanging information has the greatest frequency. Naturally, research on language functions in English textbooks helps teachers in selecting the best materials.
      172  194
  • Publication
    Open Access
    An introduction to the cognitive diagnostic assessment system
    (National Institute of Education (Singapore), 2020) ;
      66  57