Now showing 1 - 10 of 77
  • Publication
    Open Access
    A systematic review of digital storytelling in language learning in adolescents and adults
    (Springer, 2022)
    Lim, Nikki Zhi Li
    ;
    Azrifah Zakaria
    ;
    Digital storytelling (DST) is a novel approach that uses modern computer technology to amplify language learning and teaching. The present study aims to review how the published DST research utilizes visuals and audio to influence the learning environment and engage adolescent and adult language learners. This was measured through their improvement in the four main language skills: reading, writing, listening, and speaking. A total 71 journal papers were identified using the Scopus database. The papers studied both first and second language learning and were coded in full-text screening for the research topics and methods adopted, theories or frameworks adopted, outcomes across the language skills, and reliability investigation of the studies. The results showed a range of research used in the studies, with 39.7% of the total studies using a mixed number of methods. The theories adopted in these studies were limited to components of DST, age group, and the type of study. Most studies neither tested nor mentioned the use of the three theoretical variables mentioned above. Notably, a majority of the studies reported positive outcomes when DST was used in the learning environment. However, not all claims were supported with evidence. Lastly, only a handful of the studies reviewed reported reliability, highlighting a lack of verification of the precision of the measurement instruments used. Implications of these findings and recommendations for designing DST and language learning research in the future will be discussed.
    WOS© Citations 4Scopus© Citations 11  205  740
  • Publication
    Metadata only
    A meta-analysis of the reliability of second language reading comprehension assessment tools
    (Cambridge University Press, 2024)
    Zhao, Huijun
    ;
    The present study aims to meta-analyze the reliability of second language (L2) reading assessments and identify the potential moderators of reliability in L2 reading comprehension tests. We examined 3,247 individual studies for possible inclusion and assessed 353 studies as eligible for the inclusion criteria. Of these, we extracted 150 Cronbach’s alpha estimates from 113 eligible studies (years 1998–2024) that reported Cronbach’s alpha coefficients properly and coded 27 potential predictors comprising of the characteristics of the study, the test, and test takers. We subsequently conducted a reliability generalization (RG) meta-analysis to compute the average reliability coefficient of L2 reading comprehension tests and identify potential moderators from 27 coded predictor variables. The RG meta-analysis found an average reliability of 0.79 (95% CI [0.78, 0.81]). The number of test items, test piloting, test takers’ educational institution, study design, and testing mode were found to respectively explain 16.76%, 5.92%, 4.91%, 2.58%, and 1.36% of variance in reliability coefficients. The implications of this study and future directions are further discussed.
      35
  • Publication
    Metadata only
    A meta-analysis of the reliability of a metacognitive awareness instrument in second language listening
    (Springer, 2024)
    Zhai, Jiayu
    ;

    Metacognitive awareness is essential in regulating second language (L2) listening and has been predominantly assessed by a multidimensional instrument named the Metacognitive Awareness Listening Questionnaire (MALQ). Since previous studies have yielded inconclusive evidence concerning the generalization of MALQ, it is important to examine the overall reliability of the MALQ measures from a meta-analytical perspective. The purpose of the study was to examine variability in the reliability of MALQ measures in the field of L2 listening. A meta-analytic reliability generalization (RG) was conducted to synthesize Cronbach’s alpha coefficients derived from 45 studies that used MALQ. The results showed that the aggregated reliability estimate was 0.80 for MALQ measures, with four out of the five subscales having an aggregate reliability coefficient larger than 0.7, i.e., 0.73 for mental translation, 0.74 for planning and evaluating, 0.71 for person knowledge, and 0.79 for problem-solving. On the other hand, the reliability of directed attention was 0.68, falling short of meeting the minimum requirement of 0.70. In addition, as a high degree of heterogeneity was found in the studies included, a mixed effect meta-regression was performed, identifying four moderators affecting the reliability of MALQ measures: publication year, educational setting, participants’ L1, and L2 proficiency level. We further found evidence for publication bias in the included publications. Suggestions for future research are provided.

      24
  • Publication
    Metadata only
    The Academic Listening Self-rating Questionnaire (ALSA) (Aryadoust, Goh, & Lee, 2012)
    The Academic Listening Self-rating Questionnaire (ALSA) is a 47-item self-appraisal tool that helps language learners evaluate their own academic listening skills (Aryadoust, Goh, & Lee, 2012). The six underlying dimensions of the ALSA consist of (a) linguistic components and prosody, (b) cognitive processing skills, (c) relating input to other materials, (d) notetaking, (e) memory and concentration, and (f) lecture structure. The psychometric quality of ALSA has been studied using the Rating Scale Rasch model, structural equation modeling, and correlation analyses. The ALSA can be used to raise tertiary-level students’ awareness of their academic listening ability and of the elements of academic discourse, such as lectures and seminars, that may affect their academic achievement. Further research is being undertaken to provide validity evidence for two versions of the instrument in Chinese and Turkish, respectively.
      37
  • Publication
    Metadata only
    The predictive value of gaze behavior and mouse-clicking in testing listening proficiency: A sensor technology study
    (Elsevier, 2024)
    Qiu, Yue
    ;
    This study employed eye-tracking and mouse click frequency analysis to investigate the predictive power of gaze behaviors, mouse-clicking, and their interactive effects with linguistic backgrounds on the IELTS (International English Language Testing System) listening test scores. A total of 77 test takers (45 with English as their first language (E-L1) and 32 with English as their second language (E-L2)) participated in this study. Their eye movements and mouse click frequencies were recorded as they took a computer-based IELTS listening test. The subsequent data analysis, utilizing linear mixed models, showed that gaze patterns, mouse actions, and language background significantly predicted listening test outcomes across four listening test sections and between E-L1 and E-L2 candidates, accounting for 33.2% of the variance observed in test scores. These results indicate the effect of potential sources of construct-irrelevant variance on test scores, which are not predicted in the available construct definitions of the test used in the study. Implications for the listening construct and test validity are discussed.
      32
  • Publication
    Open Access
    Neurocognitive evidence for test equity in an academic listening assessment
    (Springer, 2023)
    Dominguez Lucio, Ester
    ;
    The present study explored the potential of a new neurocognitive approach to test equity which integrates evidence from eye-tracking and functional near-infrared spectroscopy with conventional test content analysis and psychometric analysis. The participants of the study (η = 29) were neurotypical university students who took two tests of English lecture comprehension. Test equity was examined in this study at four levels: the linguistic level (content evidence) and the test scores level which are conventional levels in test equity; and gaze behavior level and neurocognitive level which are novel to this study. It was found that the linguistic features of the two test forms being equated were similar and that there was no significant difference at neurocognitive and behavioral levels. However, there was a significant difference in gaze behaviors, measured by fixation counts and visit counts, although fixation duration and visit duration did not vary across the two tests. Overall, test equity was supported, despite partial counterevidence from the gaze data. We discuss the implication of this approach for future equity research and response process in language assessment.
    Scopus© Citations 3  89  144
  • Publication
    Open Access
    Scopus© Citations 10  170  566
  • Publication
    Metadata only
    The log-linear cognitive diagnosis modeling (LCDM) in second language listening assessment
    (Routledge, 2019)
    Toprak, Tugba Elif
    ;
    ;
    This chapter focuses on the log-linear cognitive diagnosis modeling (LCDM), a general diagnostic classification model (DCM) family that allows researchers to model a large group of diagnostic classification models (DCMs) flexibly. Although the LCDM has important advantages over other core DCMs, it remains relatively under-researched in language assessment. This chapter first provides language testers with an introduction to the theoretical and statistical underpinnings of the LCDM. Next, it demonstrates how the LCDM could be applied to a high-stakes listening comprehension test. Finally, it presents guidelines on how to estimate and interpret the model, item, and examinee parameters with readily available software.
      12
  • Publication
    Metadata only
    Investigating the visual content of a commercialized academic listening test: Implications for validity
    (Elsevier, 2024)
    Hou, Zhuohan
    ;
    ;
    Azrifah Zakaria
    As incorporating visual modes in listening tests is gradually gaining traction in second language (L2) assessment, the inclusion of such visuals brings up questions about the role of visual modes in meaning-making during listening and test validity. In this study, we investigated the visual features of the International English Language Testing System (IELTS) listening test through the application of the social semiotic multimodal framework. Our corpus comprised 300 visuals from 256 academic listening testlets published between 1996 and 2022. Unlike the past studies of social semiotic multimodal analyses that relied on qualitative methods, our study adopted a series of visualization and quantitative statistical analysis of frequency and dispersion measures, using the general linear model to examine the visuals from a social semiotic multimodal perspective. The results revealed significant variation in the visual structures of the testlets. Through applying a post-hoc analysis, we further proposed recommendations for further research on multimodal materials in listening assessment and discussed the implications of the observed variation for the validity of the IELTS listening test. This study may be considered the first attempt to examine L2 listening assessment from a corpus-based social semiotic multimodal perspective, which may inspire more investigations on multimodal listening.
      49
  • Publication
    Open Access
    Investigating the construct validity of the MELAB Listening Test through the Rasch analysis and correlated uniqueness modeling
    (University of Michigan, 2010) ;
    This article evaluates the construct validity of the Michigan English Language Assessment Battery (MELAB) listening test by investigating the underpinning structure of the test (or construct map), possible construct under representation and construct-irrelevant threats. Data for the study, from the administration of a form of the MELAB listening test to 916 international test takers, were provided by the English Language Institute of the University of Michigan. The researchers sought evidence of construct validity primarily through correlated uniqueness models (CUM) and the Rasch model. A five factor CUM was fitted into the data but did not display acceptable measurement properties. The researchers then evaluated a three-traits1 confirmatory factor analysis (CFA) that fitted the data sufficiently. This fitting model was further evaluated with parcel items, which supported the proposed CFA model. Accordingly, the underlying structure of the test was mapped out as three factors: ability to understand minimal context stimuli, short interactions, and long-stretch discourse. The researchers propose this model as the tentative construct map of this form of the test. To investigate construct under representation and construct-irrelevant threats, the Rasch model was used. This analysis showed that the test was relatively easy for the sample and the listening ability of several higher ability test takers were sufficiently tested by the items. This is interpreted to be a sign of test ceiling effects and minor construct-underrepresentation, although the researchers argue that the test is intended to distinguish among the students who have a minimum listening ability to enter a program from those who don’t. The Rasch model provided support of the absence of construct-irrelevant threats by showing the adherence of data to uni dimensionality and local independence, and good measurement properties of items. The final assessment of the observed results showed that the generated evidence supported the construct validity of the test.
      79  309