Abstract
Recent studies have shown that the frequency effect, although long used as a guide to word difficulty, fails to explain all variance in learner word knowledge. As such, a “more than frequency” conclusion has been offered to explain how lexical sophistication accounts for word difficulty. This study presents a multiple regression model of word-learning difficulty from a data set of monolingual Japanese first language (L1) learners. Vocabulary Size Test (VST) scores of 2,999 L1 Japanese university students were converted to logit scores to determine the word-learning difficulty of 80 target words. Five lexical sophistication variables were found to correlate with word-learning difficulty (frequency, cognate status, age of acquisition, prevalence, and polysemy) above a practical significance threshold. These were subsequently entered into a regression model with the logit scores as the dependent variable. The model (R2 = .55) indicates that three lexical sophistication variables significantly predicted VST scores: frequency (ß = -.28, p = .029), cognateness (ß = -.24, p = .005), and prevalence (ß = 0.22, p = .040). Despite suggestions that complexity studies be interpreted considering what is understood about the construct of linguistic complexity, researchers have rarely made explicit the differences between absolute and relative complexity variables. As some variables can be shown to vary in complexity according to the L1 population, these must be considered in discussions of test generalizability. Although frequency will continue to be the primary criterion for the selection of lexical items for teaching and testing, the cognate status of words can be used to predict the potential learning burden of the word more precisely for learners of different L1 backgrounds.
References
Adelman, J. S., & Brown, G. D. (2007). Phonographic neighbors, not orthographic neighbors, determine word naming latencies. Psychonomic Bulletin & Review, 14(3), 455-459. DOI: https://doi.org/10.3758/BF03194088
Afshartous, D., & Preston, R. A. (2011). Key results of interaction models with centering. Journal of Statistics Education, 19(3). DOI: https://doi.org/10.1080/10691898.2011.11889620
Allen, D., & Conklin, K. (2013). Crosslinguistic similarity norms for Japanese-English translation equivalents. Behavior Research Methods, 46(2), 540-563. DOI: https://doi.org/10.3758/s13428-013-0389-z
Beglar, D. (2010). A Rasch-based validation of the Vocabulary Size Test. Language Testing, 27(1), 101-118. DOI: https://doi.org/10.1177/0265532209340194
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2013). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904-911. DOI: https://doi.org/10.3758/s13428-013-0403-5
Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition. 2(1), 1-38. DOI: https://doi.org/10.5334/joc.72
Brysbaert, M., Keuleers, E., & Mandera, P. (2021). Which words do English nonnative speakers know? New supernational levels based on yes/no decision. Second Language Research, 37(2), 207-231. DOI: https://doi.org/10.1177/0267658320934526
Bulté, B., & Housen, A. (2012). Defining and operationalizing L2 complexity. In A. Housen, F. Kuiken & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 21-46). John Benjamins. DOI: https://doi.org/10.1075/lllt.32.02bul
Christian, J., Bickley, W., Tarka, M., & Clayton, K. (1978). Measures of free recall of 900 English nouns: Correlations with imagery, concreteness, meaningfulness, and frequency. Memory & Cognition, 6(4), 379-390. DOI: https://doi.org/10.3758/BF03197470
Chumbley, J. I., & Balota, D. A. (1984). A word’s meaning affects the decision in lexical decision. Memory & Cognition, 12(6), 590-606. DOI: https://doi.org/10.3758/BF03213348
Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology, 33(4), 497-505. DOI: https://doi.org/10.1080/14640748108400805
Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S. (2011). What is lexical proficiency? Some answers from computational models of speech data. TESOL Quarterly, 45(1), 182-193. DOI: https://doi.org/10.5054/tq.2010.244019
Crossley, S., Kyle, K., & Salsbury, T. (2016). A usage-based investigation of L2 lexical acquisition: The role of input and output. Modern Language Journal, 100(3), 702-715. DOI: https://doi.org/10.1111/modl.12344
Dahl, Ö. (2004). The growth and maintenance of linguistic complexity. John Benjamins. DOI: https://doi.org/10.1075/slcs.71
Daulton, F. E. (1998). Japanese loanword cognates and the acquisition of English vocabulary. The Language Teacher, 22(1), 17-25.
Daulton, F. E. (2007). Japan’s built-in lexicon of English-based loanwords. Multilingual Matters. DOI: https://doi.org/10.21832/9781847690319
Davies, M. (2008). The Corpus of Contemporary American English (COCA): 560 million words, 1990-present. https://corpus.byu.edu/coca/
De Wilde, V., Brysbaert, M., & Eyckmans, J. (2020). Learning English through out‐of‐school exposure: How do word‐related variables and proficiency influence receptive vocabulary learning? Language Learning, 70(2), 349-381. DOI: https://doi.org/10.1111/lang.12380
De Wilde, V. (2023). The auditory picture vocabulary test for English L2: A spoken receptive meaning-recognition test intended for Dutch-speaking L2 learners of English. Language Teaching Research. DOI: https://doi.org/10.1177/13621688221147462
Dijkstra, T., Grainger, J., & van Heuven, W. J. B. (1999). Recognition of cognates and interlingual homographs: The neglected role of phonology. Journal of Memory and Language, 41(4), 496-518. DOI: https://doi.org/10.1006/jmla.1999.2654
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Sec-ond Language Acquisition, 24(2), 143-188. DOI: https://doi.org/10.1017/S0272263102002024
Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary learning. Language Learning, 43(4), 559-617. DOI: https://doi.org/10.1111/j.1467-1770.1993.tb00627.x
Eguchi, M., & Kyle, K. (2020). Continuing to explore the multidimensional nature of lexical sophistication: The case of oral proficiency interviews. Modern Language Journal, 104(2), 381-400. DOI: https://doi.org/10.1111/modl.12637
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A flexi-ble statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. DOI: https://doi.org/10.3758/BF03193146
Gries, S. T. (2020). Analyzing dispersion. In M. Paquot & S. T. Gries (Eds.), A practical handbook of corpus linguistics (pp. 99-118). Springer. DOI: https://doi.org/10.1007/978-3-030-46216-1_5
Hashimoto, B. J. (2021). Is frequency enough? The frequency model in vocabulary size testing. Language Assessment Quarterly, 18(2), 171-187. DOI: https://doi.org/10.1080/15434303.2020.1860058
Hashimoto, B. J., & Egbert, J. (2019). More than frequency? Exploring predictors of word difficulty for second language learners. Language Learning, 69(4), 839-872. DOI: https://doi.org/10.1111/lang.12353
Hoffman, P., Lambon Ralph, M. A., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior Research Methods, 45(3), 718-730. DOI: https://doi.org/10.3758/s13428-012-0278-x
Kim, M., Crossley, S. A., & Kyle, K. (2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Modern Language Journal, 102(1), 120-141. DOI: https://doi.org/10.1111/modl.12447
Kondrak, G. (2000). A new algorithm for the alignment of phonetic sequences. Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, 288-295.
Kondrak, G. (2003). Phonetic alignment and similarity. Computers and the Humanities, 37, 273-291. DOI: https://doi.org/10.1023/A:1025071200644
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990. DOI: https://doi.org/10.3758/s13428-012-0210-4
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757-786. DOI: https://doi.org/10.1002/tesq.194
Kyle, K., & Crossley, S. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 12-24. DOI: https://doi.org/10.1016/j.jslw.2016.10.003
Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50(3), 1030-1046. DOI: https://doi.org/10.3758/s13428-017-0924-4
Laufer, B. (1989). A factor of difficulty in vocabulary learning: Deceptive transparency. In I. S. P. Nation & R. Carter (Eds.), Vocabulary acquisition (pp. 10-20). Free University Press.
Laufer, B., & McLean, S. (2016). Loanwords and vocabulary size test scores: A case of different estimates for different L1 learners. Language As-sessment Quarterly, 13(3), 202-217. DOI: https://doi.org/10.1080/15434303.2016.1210611
Linacre, J. M. (2002). What do infit and outfit, mean-square and standardized mean? Rasch Measurement Transactions, 16(2), p. 878.
Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. Modern Language Journal, 96(2), 190-208. DOI: https://doi.org/10.1111/j.1540-4781.2011.01232_1.x
McDonald, S. A., & Shillcock, R. C. (2001). Rethinking the word frequency effect: The neglected role of distributional information in lexical processing. Language and Speech, 44(3), 295-322. DOI: https://doi.org/10.1177/00238309010440030101
McLean, S., Hogg, N., & Kramer, B. (2014). Estimations of Japanese university learners’ English vocabulary sizes using the vocabulary size test. Vocabulary Learning and Instruction, 3(2), 47-55. DOI: https://doi.org/10.7820/vli.v03.2.mclean.et.al
Mohammad, S. (2018). Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 174-184. DOI: https://doi.org/10.18653/v1/P18-1017
Moranski, K., & Ziegler, N. (2021). A case for multisite second language acquisition research: Challenges, risks, and rewards. Language Learning, 71(1), 204-242. DOI: https://doi.org/10.1111/lang.12434
Morrison, C., & Ellis, A. (1995). Roles of word frequency and age of acquisition in word naming and lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(1), 116-133. DOI: https://doi.org/10.1037//0278-7393.21.1.116
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1), 59-82. DOI: https://doi.org/10.3138/cmlr.63.1.59
Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9-13.
NLP tools for the social sciences. (2016). TAALES 2.0 index description spreadsheet. https://docs.google.com/spreadsheets/d/1axmeHlKE-aelPHX4L17WpHjC7Jn4yQlE/edit#gid=858394526
Pallotti, G. (2015). A simple view of linguistic complexity. Second Language Research, 31(1), 117-134. DOI: https://doi.org/10.1177/0267658314536435
Pinchbeck, G. G., Brown, D., McLean, S., & Kramer, B. (2022). Validating word lists that represent learner knowledge in EFL contexts: The impact of the definition of word and the choice of source corpora. System, 106, 1-14. DOI: https://doi.org/10.1016/j.system.2022.102771
Peters, E. (2020). Factors affecting the learning of single-word items. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp.125-142). Routledge. DOI: https://doi.org/10.4324/9780429291586-9
Schepens, J., Dijkstra, T., & Grootjen, F. (2011). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(1), 157-166. DOI: https://doi.org/10.1017/S1366728910000623
Schmitt, N. (1998). Tracking the incremental acquisition of second language vocabulary: A longitudinal study. Language Learning, 48(2), 281-317. DOI: https://doi.org/10.1111/1467-9922.00042
Schmitt, N., Dunn, K., O’Sullivan, B., Anthony, L., & Kremmel, B. (2021). Introducing knowledge-based vocabulary lists (KVL). TESOL Journal, 12(4), e622. DOI: https://doi.org/10.1002/tesj.622
Siskova, Z. (2012). Lexical richness in EFL students’ narratives. University of Reading Language Studies Working Papers, 4, 26-36.
Stewart, J., Vitta, J. P., Nicklin, C., McLean, S., Pinchbeck, G. G., & Kramer, B. (2022). The relationship between word difficulty and frequency: A response to Hashimoto (2021). Language Assessment Quarterly, 19(1), 90-101. DOI: https://doi.org/10.1080/15434303.2021.1992629
Tanaka-Ishii, K., & Terada, H. (2011). Word familiarity and frequency. Studia Linguistica, 65(1), 96-116. DOI: https://doi.org/10.1111/j.1467-9582.2010.01176.x
Toglia, M. P., & Battig, W. F. (1978). Handbook of semantic word norms. Lawrence Erlbaum.
Vitta, J. P., & Al-Hoorie, A. (2021). Measurement and sampling recommendations for L2 flipped learning experiments: A bottom-up methodological synthesis. Journal of Asia TEFL, 18(2), 682-692. DOI: https://doi.org/10.18823/asiatefl.2021.18.2.23.682
Vitta, J. P., Nicklin, C., & McLean, S. (2022). Effect size-driven sample-size planning, randomization, and multisite use in L2 instructed vocabulary acquisition experimental samples. Studies in Second Language Acquisition, 44(5), 1424-1448. DOI: https://doi.org/10.1017/S0272263121000541
Vitta, J. P., Nicklin, C., & Albright, S. W. (2023). Academic word difficulty and multidimensional lexical sophistication: An English‐for‐academic‐purposes‐focused conceptual replication of Hashimoto and Egbert (2019). Modern Language Journal, 107(1), 373-397. DOI: https://doi.org/10.1111/modl.12835
Willis, M., & Ohashi, Y. (2012). A model of L2 vocabulary learning and re-tention. The Language Learning Journal, 40(1), 125-137. DOI: https://doi.org/10.1080/09571736.2012.658232
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), p. 370.
License
Copyright (c) 2024 Derek Canning, Stuart McLean, Joseph Vitta
This work is licensed under a Creative Commons Attribution 4.0 International License.
1.1 The Author hereby warrants that he/she is the owner of all the copyright and other intellectual property rights in the Work and that, within the scope of the present Agreement, the paper does not infringe the legal rights of another person. The owner of the copyright work also warrants that he/she is the sole and original creator thereof and that is not bound by any legal constraints in regard to the use or sale of the work.
1.2. The Publisher warrants that is the owner of the PRESSto platform for open access journals, hereinafter referred to as the PRESSto Platform.
2. The Author grants the Publisher non-exclusive and free of charge license to unlimited use worldwide over an unspecified period of time in the following areas of exploitation:
2.1. production of multiple copies of the Work produced according to the specific application of a given technology, including printing, reproduction of graphics through mechanical or electrical means (reprography) and digital technology;
2.2. marketing authorisation, loan or lease of the original or copies thereof;
2.3. public performance, public performance in the broadcast, video screening, media enhancements as well as broadcasting and rebroadcasting, made available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them;
2.4. inclusion of the Work into a collective work (i.e. with a number of contributions);
2.5. inclusion of the Work in the electronic version to be offered on an electronic platform, or any other conceivable introduction of the Work in its electronic version to the Internet;
2.6. dissemination of electronic versions of the Work in its electronic version online, in a collective work or independently;
2.7. making the Work in the electronic version available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, in particular by making it accessible via the Internet, Intranet, Extranet;
2.8. making the Work available according to appropriate license pattern Attribution 4.0 International (CC BY 4.0) as well as another language version of this license or any later version published by Creative Commons.
3. The Author grants the Publisher permission to reproduce a single copy (print or download) and royalty-free use and disposal of rights to compilations of the Work and these compilations.
4. The Author grants the Publisher permission to send metadata files related to the Work, including to commercial and non-commercial journal-indexing databases.
5. The Author represents that, on the basis of the license granted in the present Agreement, the Publisher is entitled and obliged to:
5.1. allow third parties to obtain further licenses (sublicenses) to the Work and to other materials, including derivatives thereof or compilations made, based on or including the Work, whereas the provisions of such sub-licenses will be the same as with the Attribution 4.0 International (CC BY 4.0) Creative Commons sub-license or another language version of this license, or any later version of this license published by Creative Commons;
5.2. make the Work available to the public in such a way that members of the public may access the Work from a place and at a time individually chosen by them, without any technological constraints;
5.3. appropriately inform members of the public to whom the Work is to be made available about sublicenses in such a way as to ensure that all parties are properly informed (appropriate informing messages).
6. Because of the royalty-free provision of services of the Author (resulting from the scope of obligations stipulated in the present Agreement), the Author shall not be entitled to any author’s fee due and payable on the part of the Publisher (no fee or royalty is payable by the Publisher to the Author).
7.1. In the case of third party claims or actions for indemnity against the Publisher owing to any infractions related to any form of infringement of intellectual property rights protection, including copyright infringements, the Author is obliged to take all possible measures necessary to protect against these claims and, when as a result of legal action, the Publisher, or any third party licensed by the Publisher to use the Work, will have to abandon using the Work in its entirety or in part or, following a court ruling in a legal challenge, to pay damages to a third party, whatever the legal basis
7.2. The Author will immediately inform the Publisher about any damage claims related to intellectual property infringements, including the author’s proprietary rights pertaining to a copyrighted work, filed against the Author. of liability, the Author is obliged to redress the damage resulting from claims made by third party, including costs and expenditures incurred in the process.
7.3. To all matters not settled herein provisions of the Polish Civil Code and the Polish Copyright and Related Rights Act shall apply.