Now showing 1 - 10 of 71
  • Publication
    Restricted
    Cognitive Diagnostic Assessment System (CoDiAS) for Singapore’s secondary schools: Toward individualized learning and assessment in language education
    (Office of Education Research, National Institute of Education, Singapore, 2023) ;
    To date, several computerized diagnostic systems have been developed. These systems are limited in their feedback delivery and assessment scopes as well as in the delivery of remedial programs. For example, the Diagnostic English Language Needs Assessment (DELNA) developed by the University of Auckland and the Diagnostic English Language Assessment (DELA) designed by the University of Melbourne function primarily like placement tests where feedback is delivered to the learners but the tests aim to place students in different language learning programs (of course, one could argue that this is the treatment that follows the diagnosis, albeit not highly differentiated at the individual level.). Similarly, the Diagnostic English Language Tracking Assessment (DELTA) designed by Hong Kong Polytechnic University and the Diagnostic Language Assessment (DIALANG) produced by Lancaster University provide feedback to learners but without specifying skill mastery profiles, differentiated remedial programs, or actionable plans (Harding, Alderson, & Brunfaut, 2015). These systems are also limited by their inability to provide fine-grained information on learners’ growth over time.
      16  51
  • Publication
    Open Access
    A corpus study of language simplification and grammar in graded readers
    (Language Institute of Thammasat University, 2023)
    Azrifah Zakaria
    ;
    ;
    Studies on graded readers used in extensive reading have tended to focus on vocabulary. This study set out to investigate the linguistic profile of graded readers, taking into account both grammar and lexis. A corpus of 90 readers were tagged according to the variables in Biber’s Multidimensional (MD) analysis, using the Multidimensional Analysis Tagger (MAT). These variables were analysed using latent class cluster analysis to determine whether the graded readers can be grouped by similarity in linguistic features. While MAT analysis surfaced more similarities than differences within the corpus, latent class clustering produced an optimal 3-class model. Post-hoc concordance analyses showed that graded readers may be categorised as having three classes of complexity: beginner, transitional, and advanced. The findings in the study suggest that selection of reading materials for extensive reading should take into consideration grammatical complexity as well as lexis. The linguistic profiles compiled in this study detail the grammatical structures and the associated lexical items within the structures that teachers may expect their students to encounter when reading graded readers. In addition, the profiles may be of benefit to teachers seeking to supplement extensive reading with form-focused instruction.
      65  70
  • Publication
    Metadata only
    Exploring the relative merits of cognitive diagnostic models and confirmatory factor analysis for assessing listening comprehension
    (Cambridge University Press, 2013) ;
    A number of scaling models – developed originally for psychological and educational studies – have been adapted into language assessment. Although their application has been promising, they have not yet been validated in language assessment contexts. This study discusses the relative merits of two such models in the context of second language (L2) listening comprehension tests: confi rmatory factor analysis (CFA) and cognitive diagnostic models (CDMs). Both CFA and CDMs can model multidimensionality in assessment tools, whereas other models force the data to be statistically unidimensional. The two models were applied to the listening test of the Michigan English Language Assessment Battery (MELAB). CFA was found to impose more restrictions on the data than CDMs. It is suggested that CFA might not be suitable for modelling dichotomously scored data of L2 listening tests, whereas the CDM used in the study (the Fusion Model) appeared to successfully portray the listening sub- skills tapped by the MELAB listening test. The paper concludes with recommendations about how to use each of these models in modelling L2 listening.
      15
  • Publication
    Open Access
    Using a two-parameter logistic item response theory model to validate the IELTS listening test
    To date many test designers have relied heavily on either the interpretation or uses of scores in high-stakes tests as evidence of validity. As a case in point, in the listening section of International English Language Testing System (IELTS), the consequences and the correlation of the scores with other measures such as academic performance of students have been extensively researched in pursuit of consequential and criterion validity. While these research inquiries are valuable to especially research test usefulness, the test should be validated for its main objectives. Consequential and criterion validity studies into the IELTS listening module have proposed different and controversial evidence. We argue if the construct validity is not established, supportive evidence of usefulness is either very difficult or impossible to find. The main purpose of this study is to investigate the construct validity of the IELTS listening test. We will employ a two-parameter logistic Item Response Theory (IRT) model to investigate the construct representation and irrelevant factor (Messick, 1988, 1989).
      18  102
  • Publication
    Open Access
    A Rasch analysis of an international English language testing system listening sample test
    This study reports on an investigation of the construct validity of an International English Language Testing System (IELTS) Listening sample test. The test was administered to 148 multinational participants. The Rasch modeling of data was used to fulfill the research objectives. Four major conclusions were made: 1) the Rasch differential item functioning analysis revealed that limited production items behave differently across different test taker groups suggesting the presence of construct-irrelevancies, 2) multiple choice questions (MCQ) do not cause construct-irrelevancies unless testees need to make ‘close paraphrases’ to comprehend the item stem or the question demands more than one answer; this nominates short MCQ as a best item format in listening tests, 3) evidence was found for ‘lexical processing’ which is different from top-down/bottom-up processing, and 4) the Wright map provided evidence for construct under-representation of the test. Findings from this study provide different sorts of evidence supporting and disproving the claim of the construct validity of the test, although they should be further investigated in future studies with different samples. Implications of the findings for IELTS and item writers are also discussed.
      234  534
  • Publication
    Open Access
    Investigating the construct validity of the MELAB Listening Test through the Rasch analysis and correlated uniqueness modeling
    (University of Michigan, 2010) ;
    This article evaluates the construct validity of the Michigan English Language Assessment Battery (MELAB) listening test by investigating the underpinning structure of the test (or construct map), possible construct under representation and construct-irrelevant threats. Data for the study, from the administration of a form of the MELAB listening test to 916 international test takers, were provided by the English Language Institute of the University of Michigan. The researchers sought evidence of construct validity primarily through correlated uniqueness models (CUM) and the Rasch model. A five factor CUM was fitted into the data but did not display acceptable measurement properties. The researchers then evaluated a three-traits1 confirmatory factor analysis (CFA) that fitted the data sufficiently. This fitting model was further evaluated with parcel items, which supported the proposed CFA model. Accordingly, the underlying structure of the test was mapped out as three factors: ability to understand minimal context stimuli, short interactions, and long-stretch discourse. The researchers propose this model as the tentative construct map of this form of the test. To investigate construct under representation and construct-irrelevant threats, the Rasch model was used. This analysis showed that the test was relatively easy for the sample and the listening ability of several higher ability test takers were sufficiently tested by the items. This is interpreted to be a sign of test ceiling effects and minor construct-underrepresentation, although the researchers argue that the test is intended to distinguish among the students who have a minimum listening ability to enter a program from those who don’t. The Rasch model provided support of the absence of construct-irrelevant threats by showing the adherence of data to uni dimensionality and local independence, and good measurement properties of items. The final assessment of the observed results showed that the generated evidence supported the construct validity of the test.
      61  285
  • Publication
    Metadata only
    Developing and validating an academic listening questionnaire
    (Pabst Science, 2012) ; ;
    Lee, Ong Kim
    This article reports on the development and administration of the Academic Listening Self-rating Questionnaire (ALSA). The ALSA was developed on the basis of a proposed model of academic listening comprising six related components. The researchers operationalized the model, subjected items to iterative rounds of content analysis, and administered the finalized questionnaire to international ESL (English as a second language) students in Malaysian and Australian universities. Structural equation modeling and rating scale modeling of data provided content-related, substantive, and structural validity evidence for the instrument. The researchers explain the utility of the questionnaire for educational and assessment purposes.
      61
  • Publication
    Open Access
      152  227
  • Publication
    Open Access
    A systematic review of the validity of questionnaires in second language research
    (MDPI, 2022)
    Zhang, Yifan
    ;
    Questionnaires have been widely used in second language (L2) research. To examine the accuracy and trustworthiness of research that uses questionnaires, it is necessary to examine the validity of questionnaires before drawing conclusions or conducting further analysis based on the data collected. To determine the validity of questionnaires that have been investigated in previous L2 research, we adopted the argument-based validation framework to conduct a systematic review. Due to the extensive nature of the extant questionnaire-based research, only the most recent literature, that is, research in 2020, was included in this review. A total of 118 questionnaire-based L2 studies published in 2020 were identified, coded, and analyzed. The findings showed that the validity of the questionnaires in the studies was not satisfactory. In terms of the validity inferences for the questionnaires, we found that (1) the evaluation inference was not supported by psychometric evidence in 41.52% of the studies; (2) the generalization inference was not supported by statistical evidence in 44.07% of the studies; and (3) the explanation inference was not supported by any evidence in 65.25% of the studies, indicating the need for more rigorous validation procedures for questionnaire development and use in future research. We provide suggestions for the validation of questionnaires.
    WOS© Citations 3Scopus© Citations 4  76  146