Options
Aryadoust, Vahid
- PublicationMetadata onlyAn investigation of differential item functioning in the MELAB listening testDifferential item functioning (DIF) analysis is a way of determining whether test items function differently across subgroups of test takers after controlling for ability level. DIF results are used to evaluate tests' validity arguments. This study uses Rasch measurement to examine the Michigan English Language Assessment Battery listening test for DIF across gender subgroups. After establishing the unidimensionality and local independence of the data, the authors used two methods to test for DIF: (a) a t-test uniform DIF analysis, which showed that two test items displayed substantive DIF, and favored different gender subgroups; and (b) nonuniform DIF analysis, which revealed several test items with significant DIF, many of which favored low-ability male test takers. A possible explanation for gender-ability DIF is that lower ability male test takers are more likely to attempt lucky guesses, particularly on multiple-choice items with unattractive distracters, and that having only two distracters makes this strategy likely to succeed.
WOS© Citations 37Scopus© Citations 49 31 - PublicationOpen AccessNeurocognitive evidence for test equity in an academic listening assessmentThe present study explored the potential of a new neurocognitive approach to test equity which integrates evidence from eye-tracking and functional near-infrared spectroscopy with conventional test content analysis and psychometric analysis. The participants of the study (η = 29) were neurotypical university students who took two tests of English lecture comprehension. Test equity was examined in this study at four levels: the linguistic level (content evidence) and the test scores level which are conventional levels in test equity; and gaze behavior level and neurocognitive level which are novel to this study. It was found that the linguistic features of the two test forms being equated were similar and that there was no significant difference at neurocognitive and behavioral levels. However, there was a significant difference in gaze behaviors, measured by fixation counts and visit counts, although fixation duration and visit duration did not vary across the two tests. Overall, test equity was supported, despite partial counterevidence from the gaze data. We discuss the implication of this approach for future equity research and response process in language assessment.
Scopus© Citations 3 83 134 - PublicationOpen AccessWhat can gaze behaviors, neuroimaging data, and test scores tell us about test method effects and cognitive load in listening assessments?The aim of this study was to investigate how test methods affect listening test takers' performance and cognitive load. Test methods were defined and operationalized as while-listening performance (WLP) and post-listening performance (PLP) formats. To achieve the goal of the study, we examined test takers' (N = 80) brain activity patterns (measured by functional near-infrared spectroscopy (fNIRS)), gaze behaviors (measured by eye-tracking), and listening performance (measured by test scores) across the two test methods. We found that the test takers displayed lower activity levels across brain regions supporting comprehension during the WLP tests relative to the PLP tests. Additionally, the gaze behavioral patterns exhibited during the WLP tests suggested that the test takers adopted keyword matching and "shallow listening." Together, the neuroimaging and gaze behavioral data indicated that the WLP tests imposed a lower cognitive load on the test takers than the PLP tests. However, the test takers performed better with higher test scores for one of two WLP tests compared with the PLP tests. By incorporating eye-tracking and neuroimaging in this exploration, this study has advanced the current knowledge on cognitive load and the impact imposed by different listening test methods. To advance our knowledge of test validity, other researchers could adopt our research protocol and focus on extending the test method framework used in this study.
WOS© Citations 15Scopus© Citations 34 165 178 - PublicationOpen AccessExploring the state of research on motivation in second language learning: A review and a reliability generalization meta-analysisWe present a thematic review and analysis of the variables affecting language learning motivation (LLM) (2008–2022). The second-language motivational self-system (L2MSS) model was found to be the most applied construct in measuring LLM. Complex systems theory was also another method gaining prominence in LLM research to explain the interactions between micro- and macro-structures surrounding the learner in influencing motivation. Other factors such as socioeconomic status, dialogism and anagnorisis were also identified as variables relating to LLM. For instance, research on dialogism and dialogue has indicated the role of conversation in shaping identity, motivation, and meaning for learners. However, our review found that much of the focus in LLM research has been on the L2MSS learning or teaching experience, while daily living has been largely neglected. We further conducted a reliability generalization meta-analysis. Our analysis found an average reliability of 0.84 (CI = 0.816–0.856), with 34% of reliability coefficients falling below the lower bound of CI. A meta-regression analysis revealed that 16% of the variance in the reliability coefficients was predicted by the number of items in the instruments. Questionnaires with an internal consistency below the lower bound of 0.816 had an average of 4.14 items, while the rest had an average of 5.71 items. We further found significant publication bias. Based on our findings, we suggest areas for future research in LLM.
Scopus© Citations 3 197 57 - PublicationMetadata onlyThe predictive value of gaze behavior and mouse-clicking in testing listening proficiency: A sensor technology studyThis study employed eye-tracking and mouse click frequency analysis to investigate the predictive power of gaze behaviors, mouse-clicking, and their interactive effects with linguistic backgrounds on the IELTS (International English Language Testing System) listening test scores. A total of 77 test takers (45 with English as their first language (E-L1) and 32 with English as their second language (E-L2)) participated in this study. Their eye movements and mouse click frequencies were recorded as they took a computer-based IELTS listening test. The subsequent data analysis, utilizing linear mixed models, showed that gaze patterns, mouse actions, and language background significantly predicted listening test outcomes across four listening test sections and between E-L1 and E-L2 candidates, accounting for 33.2% of the variance observed in test scores. These results indicate the effect of potential sources of construct-irrelevant variance on test scores, which are not predicted in the available construct definitions of the test used in the study. Implications for the listening construct and test validity are discussed.
31 - PublicationOpen AccessBibliometrics and scientometrics in applied linguistics: Epilogue to the special issue
In this paper, I first discuss the field of bibliometrics, which is a quantitative approach to analyzing scholarly publications, and its subfield, scientometrics, which focuses exclusively on scientific literature. I argue that the use of bibliometric methods has been growing in applied linguistics in recent years, and explore the common features between bibliometrics and scientometrics. I will then review the papers published in the special issue on bibliometrics in applied linguistics, which features nine papers on various bibliometric topics. I conclude with suggestions for future research in the field, including the development of scales for measuring perceived prestige, investigation of indicators of influence and a predictive theory for impact of second language (L2) research, and further investigation into the imbalance in the representation of authors based in different parts of the world.
50 6 - PublicationMetadata onlyA meta-analysis of the reliability of Second language listening tests (1991–2022)
To investigate the reliability of L2 listening tests and explore potential factors affecting the reliability, a reliability generalization (RG) meta-analysis was conducted in the present study. A total number of 122 alpha coefficients of L2 listening tests from 92 published articles were collected and submitted to a linear mixed effects RG analysis. The papers were coded based on a coding scheme consisting of 16 variables classified into three categories: study features, test features, and statistical results. The results showed an average reliability of 0.818 (95% CI: 0.803 to 0.833), with 40% of reliability estimates falling below the lower bound of CI. The presence of publication bias and heterogeneity was found in the reliability of L2 listening tests, indicating that low reliability coefficients were likely omitted from some published studies. In addition, two factors predicting the reliability of L2 listening tests were the number of items and test type (standardized and researcher- or teacher-designed tests). The study also found that reliability is not a moderator of the relationship between L2 listening scores and theoretically relevant constructs. Reliability induction was identified in reporting the reliability of L2 listening tests, too. Implications for researchers and teachers are discussed.
30 - PublicationMetadata onlyEvolutionary algorithm-based symbolic regression to determine the relationship of reading and lexicogrammatical knowledgeThis chapter introduces evolutionary algorithm-based (EA-based) symbolic regression, which is an optimization model inspired by nature. EA-based symbolic regression is used to predict reading comprehension proficiency by using English learners' vocabulary and grammatical knowledge. EA-based symbolic regression draws on the fundamental concepts of Darwinian evolution, such as breeding and variety, and applies modeling to assess the accuracy and relevance of the prediction models. In this technique, multiple models are generated among which the one with the optimal fit is chosen as the “parent” and the basis for “breeding” further models, called offspring, for the following generations. The present study finds a significant nonlinear relationship between lexicogrammatical knowledge and reading comprehension proficiency (R2 = .520). Details and computational requirements are discussed and implications for language assessment are explored.
12 - PublicationMetadata onlyInvestigating differential item functioning across interaction variables in listening comprehension assessment
Differential item functioning (DIF) analysis is essential to ensuring the equity of measurement for different subgroups at the item level and is an integral part of validity. However, existing DIF research often overlooks within-group heterogeneity, commonly assuming that test takers from different subgroups comprise a homogeneous population. This study investigated DIF across gender, academic background, and their interaction in listening comprehension assessment using Rasch measurement. It found that ignoring within-group heterogeneity would lead to the under-detection of DIF, likely due to the cancellation of DIF at broader group levels. In addition, the study is the first to investigate DIF in a linked test, a scenario more prevalent in practical testing. The findings of the study highlight the importance of accounting for within-group heterogeneity in test fairness investigations in language assessment research and point to the potential effect of test linking and equating on DIF analysis and interpretation.
Scopus© Citations 1 46 - PublicationMetadata onlyA meta-analysis of the reliability of a metacognitive awareness instrument in second language listening
Metacognitive awareness is essential in regulating second language (L2) listening and has been predominantly assessed by a multidimensional instrument named the Metacognitive Awareness Listening Questionnaire (MALQ). Since previous studies have yielded inconclusive evidence concerning the generalization of MALQ, it is important to examine the overall reliability of the MALQ measures from a meta-analytical perspective. The purpose of the study was to examine variability in the reliability of MALQ measures in the field of L2 listening. A meta-analytic reliability generalization (RG) was conducted to synthesize Cronbach’s alpha coefficients derived from 45 studies that used MALQ. The results showed that the aggregated reliability estimate was 0.80 for MALQ measures, with four out of the five subscales having an aggregate reliability coefficient larger than 0.7, i.e., 0.73 for mental translation, 0.74 for planning and evaluating, 0.71 for person knowledge, and 0.79 for problem-solving. On the other hand, the reliability of directed attention was 0.68, falling short of meeting the minimum requirement of 0.70. In addition, as a high degree of heterogeneity was found in the studies included, a mixed effect meta-regression was performed, identifying four moderators affecting the reliability of MALQ measures: publication year, educational setting, participants’ L1, and L2 proficiency level. We further found evidence for publication bias in the included publications. Suggestions for future research are provided.
22