Master of Arts (Applied Linguistics)

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 232
  • Publication
    A multidimensional analysis of a high-stakes English listening test
    Tao, Xuelian

    Gaokao, also known as China’s national college entrance exam, is a high-stakes exam for nearly all Chinese students. English has been one of the three most important subjects for a long time and listening plays an important role in the Gaokao English test. However, relatively little research has been conducted on local English listening tests in China or other countries. More importantly, according to the Chinese Ministry of Education, the difficulty of the test papers used in each province or municipality varied depending on their economic development and educational resources over the past two decades. Previous studies of English listening tests have been conducted from limited perspectives, while the difficulty related linguistic features of the English listening test have almost never been examined. This study aims to fill the gaps and examine the assumption by analyzing the typical linguistic features and corresponding functional dimensions of the three different text types in the listening tests and investigating whether the papers used in the three regions of China were differentiated in terms of the co-occurrence patterns of lexicogrammatical features and dimensions of the transcripts.

    Linguistic corpora have been universally employed to inform the assessment of writing and speaking skills. However, a significant gap persists in corpus-based approaches when we turn our attention to listening assessment. A multidimensional analysis (MDA), which incorporated register analysis, corpus linguistics, and quantitative analysis methods, of the Chinese Gaokao English Listening Test Corpus was conducted to describe and compare the linguistic features and corresponding functional dimensions of the three text types in the listening tests. The corpus in this study consists of 170 sets of test papers covering nearly all provinces and cities from 2000 to 2022.

    Six exclusive dimensions were extracted using MDA: 1. oral versus literate discourse; 2. procedural discourse; 3. informational versus involved production; 4. elaborated discourse - relative clauses; 5. syntactic and clausal complexity; and 6. phrasal complexity. Four of these dimensions are in alignment with the dimensions extracted from previous MDA studies, which provides evidence supporting the universality of some of the MDA dimensions and presents some recommended teaching focus for the future. The results also show that some dimension scores were different in test papers from different regions. Overall, it seems that MOE has decided that if these children wish to go to the same universities as children from more affluent regions, the level-playing field would be to give them listening tests with specific lexicogrammatical features that may facilitate listening for them in Gaokao, such as relative clauses, phrases, procedural discourses, and complex clauses.

      36  18
  • Publication
    A reliability generalization meta-analysis of foreign language anxiety measurements
    Qiao, Shuyi

    Reliability refers to the consistency of test scores across occasions, instruments or raters. High reliability can increase statistical power and reduce the risk of Type I and Type II errors. Reliability is not the property of tests but the function of test scores that can change with characteristics of the test, the conditions of administration as well as the group of examinees. However, research shows that many previous studies did not measure reliability but induced them instead, which would cast doubt on the accuracy of their statistical analysis conducted. Reliability Generalization (RG) meta-analysis is a useful method to investigate the extent to which the scores obtained from various measurements are reliable and what factors may cause the variance in reliability. This study utilizes RG meta-analysis to investigate the reliability of instruments measuring foreign language anxiety, the most-often investigated affect variable in L2 research.

    This study meta-analyzed 204 Cronbach’s alpha coefficients from 197 studies and 48 foreign language anxiety instruments to aggregate the overall reliability obtained from foreign language anxiety scores. It also investigated the possible variables contributing to the variability of anxiety scores. The pooling of effect sizes yielded an average reliability of 0.8717 and 95% confidence interval (0.8629-0.8806). A large amount of heterogeneity was detected. Therefore, a comprehensive moderator analysis of the study, instrument and population characteristics was carried out to explore possible sources of this variability. Features such as standard deviation of test scores, number of items, number of negative wording items, number of factors in factor analysis, administration methods, etc. were found to significantly affect the reliability of anxiety scores. A sensitivity analysis was also performed to test the robustness of the pooled reliability and diagnose the potential publication bias.

    This study also extended the scope of conventional RG meta-analysis by investigating the impact of reliability of language anxiety scores on its predictive validity (anxiety-language proficiency correlations). Linear mixed-effect modeling was adopted to examine the effect of moderators on obtained anxiety-proficiency correlations. Based on the final model, the reliability of anxiety scores was found to significantly moderate the anxiety-proficiency correlations. Specifically, higher anxiety reliability coefficients were often associated with larger anxiety-language proficiency effect sizes, indicating that more reliable anxiety scores could strengthen its extrapolation to proficiency test scores. Additionally, as an exploratory study, the p-values around the anxiety-proficiency correlations were also coded and entered into correlation analysis. Higher reliability was found to significantly correlate with lower p-values, indicating that the stronger reliability could decrease the risk of Type I error. Implications of the findings were also discussed.

      20  22
  • Publication
    A corpus-based frame semantic analysis of commercialized listening tests : implications for content validity
    Zhao, Yufan

    Commercialized listening tests can significantly impact test-takers’ lives, as they are often required for purposes such as immigration, employment opportunities, and university admissions. However, there is a noticeable research gap regarding the content validity of these tests. To address the gap, this study aims to examine the semantic features of the simulated mini-lectures in the listening sections of the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS) to explore the content validity of the two tests.

    This study utilized two study corpora, the IELTS corpus with 68 mini-lectures (46,823 words) and the TOEFL corpus with 285 mini-lectures (207,296 words). The reference corpus comprised 59 lectures from the Michigan Corpus of Academic Spoken English (MICASE), totaling 571,354 words. The theoretical framework employed in the study is frame semantics that asserts words should be understood within cognitive frames. The data was submitted to Wmatrix5 for automatized semantic tagging, which generated 488 semantic frames. Three comparisons were conducted: IELTS vs. TOEFL, IELTS vs. MICASE lectures, and TOEFL vs. MICASE lectures.

    The results suggest that the mini-lectures of IELTS listening tests cover fewer academic discourse fields than TOEFL mini-lectures. Therefore, it is suggested that IELTS test developers prioritize materials resembling genuine academic lectures over non-specialist texts. TOEFL test developers should extend the coverage of the test content and continue to mirror the academic discourse.

    Furthermore, IELTS and TOEFL mini-lectures reflected the similarity of 78% and 64% of the examined semantic frames respectively, underlining their relative authenticity. Similarly, a pervasive ‘objectivity’ was evident across all three corpora, with emotion-related categories being sparse. Nevertheless, specific topics, such as politics, war, and intimate and sexual relationships, were notably absent from the test corpora, even though they appeared in the academic lecture corpus.

    Finally, as the simulated mini-lectures in IELTS and TOEFL are significantly shorter than authentic lectures, the positive results supporting the authenticity of the simulated lectures are attenuated. It is necessary to confirm whether these mini-lectures in the listening tests can engage test takers in the same cognitive processes as authentic academic lectures.

      63  12
  • Publication
    A meta-analysis of the reliability of L2 reading comprehension assessments
    Zhao, Huijun

    Score reliability is one of the major facets of modern validity frameworks in language assessment. Specifically, within the argument-based validation of assessments, reliability functions as indispensable evidence in the cause-effect dynamic of generalization and higher-level validity inferences. The present study aims to determine the average reliability of L2 reading tests, identify the potential moderators of reliability in L2 reading comprehension tests and explore the potential power of reliability in predicting the relationship between generalization and explanation inferences.

    A reliability generalization (RG) meta-analysis was conducted to compute the average reliability coefficient of L2 reading comprehension tests and identify the potential predictor variables that moderate reliability. I examined 1883 individual studies from Scopus, the Web of Science, ERIC, and LLBA databases for possible inclusion and assessed 266 studies as eligible for the inclusion criteria. Out of these, I extracted 85 Cronbach’s alpha estimates from 60 studies (years 2002-2023) that reported Cronbach’s alpha estimates properly and coded 28 potential predictors comprising of the characteristics of the study, the test, and test-takers. A linear mixed-effects model (LMEM) analysis was subsequently conducted to test for the predictive power of reliability coefficient in the relationship between generalization and explanation inferences. I further examined the impact of Cronbach’s alpha coefficient on the correlation between L2 reading comprehension tests and various language proficiency measures. This involved the reliability estimates of reading comprehension tests from 24 studies and 189 correlation data points between the reading comprehension tests and measures of language proficiency categorized into 11 groups.

    The RG meta-analysis found an average reliability of 0.78 (95% CI [0.76, 0.80]) with 40% of Cronbach’s coefficients falling below the lower bound of the confidence interval. The results of a heterogeneity test of Cronbach’s alphas indicated significant heterogeneity across studies (I2 = 97.58%) with variances partitioned into sampling error (2.42%), within-study (24.64%) and between-study (72.95%) differences. The number of test items and test-takers’ L1 were found to explain 19.65% and 13.70% of variation in the reliability coefficients across the studies, respectively. The LMEM analysis showed that alpha coefficients do not predict the correlation between reading comprehension tests and other measures of language proficiency. The implications of this study, its limitations and future studies are further discussed.

      28  17
  • Publication
    What makes articles highly cited? A bibliometric analysis of the top 1% most cited research in applied linguistics(2000-2022)
    Zhang, Sai

    Citation counts, although controversial, have long been used as a yardstick for research evaluation. The normative view regards citing as a means to credit scientific contributions, so the number of citations reflects not only scholarly attention but also research quality. However, the application of social constructivist theory introduces a nuanced perspective, asserting that a variety of factors unrelated to scientific merit can potentially influence citation counts. This dual nature of citation practices has been widely discussed across disciplines, yet it remains an underexplored domain in applied linguistics. This bibliometric study, with a particular interest in highly cited papers, aimed to investigate the citation patterns of applied linguistics research over two decades as well as the complexity that underpins their making.

    The dataset consists of 302 Quartile-1 journal papers that rank in the top 1% by citations in applied linguistics literature (2000-2022), with their detailed bibliometric information collected from Scopus (as of March 2023). Building upon literature, we considered a total of eleven extrinsic factors independent of scientific quality but could potentially affect citation counts, covering journal-related, author-related, and article-related features, respectively. Descriptive analysis was applied to unfold the citation landscape of targeted papers over time characterized by each factor. After a primary look at the bivariate relationship between variables through correlation analysis, multiple linear regression models were adopted to simultaneously examine the extent to which predictor variables are associated with citation outcomes.

    The results showed that in the best regression model, time-normalized citations were significantly predicted by six factors: journal prestige, accessibility, co-authorship, research performance, title, and subfield of applied linguistics. The remaining five factors did not exhibit any statistical significance, including internationality, geographical origin, funding, references, and methodology. Certain underlying social mechanisms were further unraveled, among which visibility explains the roles of significant factors in a unified manner, accelerating the recognition and dissemination of research discoveries in the dedicated field. The explanatory strength of all predictors together was observed to be limited (R²=.208, p<.05), but it was expected, considering that they are extrinsic properties unrelated to scientific merit. There is no doubt that the major citation driver should be the intrinsic quality of research, and the remaining variance can be also explained by many other extrinsic features yet to be explored.

    To the best of our knowledge, this is the first study that investigates a host of factors contributing to high citations in applied linguistics research. Implications of the research were also discussed, addressing the needs of both applied linguistics researchers and policymakers. We further suggested a more comprehensive approach for evaluative bibliometrics, integrating both qualitative and quantitative indicators to shed light on the whole rewarding system for only good research practices.

      31  21