Studies in Second Language Learning and Teaching Is learning really just believing? A meta-analysis of self-efficacy and achievement in SLA

The positive psychology movement (Seligman, 1998) has contributed to the proclamation of a positive turn in second language acquisition (SLA) (MacIntyre et al., 2016). Within the context of individual differences, self-efficacy (Bandura, 1997), an individual’s judgment of their capability to achieve goals, has gained particular interest in language learning (e.g., Lake, 2013). The present study meta-analyzes a body of research that has investigated the relationship between second language (L2) self-efficacy and L2 achievement by exploring 1) reporting practices in this domain, 2) the strength and direction of the relationship, and 3) the effects of moderator variables on the self-efficacy-achievement link. A comprehensive literature search uncovered 37 studies, which contributed to a total of 40 independent samples ( N = 23,050). The average observed effect in the sample was r = .46. A moderator analysis showed systematic varia-tions in the effect size for learners’ first language, target language, proficiency level, and both self-efficacy and achievement type. We discuss our findings with respect to theoretical constructs and methodological practices and suggest implications for L2 pedagogy and future research into self-efficacy in SLA.


Introduction
For nearly three decades, self-efficacy has seen a steady line of interest in SLA. In fact, since the positive turn in the field (MacIntyre et al., 2016;MacIntyre & Mercer, 2014), studies addressing positive affect in the foreign language (FL) classroom, including enjoyment (e.g., Dewaele & MacIntyre, 2014;Jin & Zhang, 2018;Zhang & Tsung, 2021), motivation (e.g., Dörnyei, 2020;Le-Thi et al., 2020) and second or foreign language (L2) grit (e.g., Alamer, 2021;Teimouri et al., 2020), have flourished and garnered increased attention with respect to language learning outcomes. As one of the most influential variables in positive psychology, self-efficacy has been explored across a range of contexts (e.g., target languages and proficiency levels), age groups (e.g., children, adolescents, and adults), language skills (e.g., reading, writing, listening and speaking), and classroom participants (e.g., students and teachers). Encouragingly, findings have shown the benefits of self-efficacy with respect to learners' L2 achievement (e.g., Hetthong & Teo, 2013), learning strategies (e.g., Balci, 2017;Gahunga, 2009;Jee, 2015;Ma et al., 2018;Mizumoto, 2013;Wang et al., 2012), and attitudes towards L2 learning (e.g., Murad Sani & Zain, 2011). Bandura's (1997) and Pajares's (1997) seminal works on self-efficacy proved to be powerful driving forces for scholars in SLA and have sparked an abundance of research into the relationship between self-efficacy and achievement. Still, the only research synthesis to date investigates self-efficacy and English proficiency only (Wang & Sun, 2020), leaving the broader contributions of research across all languages unexamined. Given the theoretical implications of self-efficacy for individual tasks (Bandura, 1997), the various linguistic contexts and learner populations that have been addressed, the plethora of existing selfefficacy and achievement instruments, and the often inconsistent reporting standards of psychometric properties of instruments in SLA (Larson-Hall & Plonsky, 2015), a comprehensive meta-analysis of the overall effect of self-efficacy on L2 achievement would be beneficial for furthering our understanding of selfefficacy as well as for developing theory and practice within the framework of positive psychology in SLA. The present study addresses this gap by means of a systematic meta-analysis in accordance with now commonly accepted methodologies in SLA (e.g., Norris & Ortega, 2006;Oswald & Plonsky, 2010;Plonsky & Oswald, 2012), including a moderator analysis of variables, such as study contexts, learner characteristics, and instrumentation.

Positive psychology in SLA
Positive psychology investigates the traits and processes that allow people to grow and flourish (Seligman & Csikszentmihalyi, 2000) and has encouraged scholars to explore positive emotions and experiences, such as flow (Csikszentmihalyi, 2000) and resilience (Pan & Chan, 2007), which support people in leading full, happy, and emotionally stable lives. Many scholars in education have adopted methods from positive psychology to investigate a range of positive emotions with respect to learning processes and achievement (e.g., Pekrun et al., 2002), a trend that has produced a number of meta-analyses (e.g., Lei & Cui, 2016;Marques et al., 2017;Möller et al., 2009;Möller et al., 2020;Petscher, 2010). For example, in a series of path meta-analyses of 118 studies (N = 213,121), Möller and colleagues (2020) found strong effects for the relationship between self-concept -"a self-description judgement that includes an evaluation of competence and the feelings of self-worth associated with the judgement in question" in a specific field (Pajares & Schunk, 2005, p. 105) -and K-12 achievement in mathematics (β = .57) and first language (L1) (β = .46), respectively. In another study, Marques et al. (2017) explored the relationship between hope and academic achievement through a meta-analysis of 45 studies (N = 9,250) carried out with K-12, undergraduate and graduate students. Their findings revealed a moderate, positive relationship (k = 24, mean ρ = .24, SD = .10) (95% CI [.20, .26]), with a stronger link for K-12 students (k = 8, mean ρ = .28, SD = .10) (95% CI [.24, .32]) than for undergraduate and graduate students (k = 16, mean ρ = .19, SD = .06) (95% CI [.14, .23]). These suggested links between positive affect and academic achievement lay the foundation for exploring the association in content-specific contexts, such as the world language classroom (i.e., including but not limited to English as second and foreign language classrooms).
The positive turn in second language acquisition (SLA) (MacIntyre et al., 2016;MacIntyre & Mercer, 2014) has acknowledged the relevance of positive psychology for improving the FL learning experience by supporting learners' L2 motivation, perseverance, and resilience as well as by fostering meaningful communication and interaction between learners and teachers. As a result, this area of inquiry has seen an influx of research on enjoyment (e.g., Dewaele & Alfawzan, 2018), L2 grit (e.g., Alamer, 2021;Teimouri et al., 2020), and self-efficacy, as detailed in the following section. Notably, L2 motivation has probably garnered the most attention (e.g., Lake, 2013;Papi et al., 2019). As an example, Al-Hoorie (2018) explored the relationship between the L2 motivational self system (L2MSS; Dörnyei, 2005Dörnyei, , 2009) -specifically the ideal L2 self, the ought-to L2 self, and the L2 learning experience -and L2 achievement through a meta-analysis of 39 samples (N = 32,078). The findings revealed that all three components of the L2MSS were significant predictors of intended effort (rs = .61, .38, and .41, respectively), the measure for subjective learning outcomes, though weaker correlations were found between motivation and L2 achievement (rs = .20, -.05, and .17). This study highlighted the need for meta-analyses that examine the effects of other positive affective variables on L2 achievement and how certain moderators might explain variability of findings.

Self-efficacy and language achievement
In perhaps one of the most cited definitions of the term, Bandura (1986) describes self-efficacy as "people's judgements of their capabilities to organize and execute courses of action required to attain designated types of performances" (p. 391). Simply put, it refers to individuals' beliefs that they hold the necessary skills to complete a particular task . These ability-related beliefs regulate achievement by influencing the goals a learner sets out to reach and the amount of effort dedicated to their performance (Bandura, 1986(Bandura, , 1997Pajares, 1997). Beyond the regulation of achievement-related cognitive processes, self-efficacy has also been found to regulate types of affective feeling states in SLA. For example, multiple studies have shown that a perceived lack of competence was related to both weak self-efficacy beliefs and the presence of negative emotions, such as anxiety (e.g., Hiver, 2013;Song, 2016). In contrast, Cheng and associates (Cheng et al., 1999) found that perceived competence was related to strong self-efficacy beliefs and the presence of positive emotions, such as self-confidence. Worth noting is Wyatt's (2018) recent argument that self-confidence should not be treated as an emotional variable. Instead, selfconfidence is characterized as a "lay term" (p. 122) for self-efficacy beliefs, thereby calling into question the nature of self-confidence and its relation to self-efficacy. Indeed, the majority of existing studies in SLA have adopted an emotional understanding of self-confidence and clearly differentiate it from cognitive self-efficacy beliefs. Thus, the present work addresses self-efficacy, specifically, and does not treat self-confidence as a synonymous term.
Despite this general positive trend, some researchers of skill-specific selfefficacy and L2 achievement have observed merely weak relationships between the two variables. For example, Sahril and Weda (2018) examined writing selfefficacy and achievement with 50 Indonesian EFL university students and found a weak positive correlation (r = .057, p < .001). Similarly, Liem and associates (2008) found a small positive correlation (r = .180, p < .001) when investigating general L2 self-efficacy and achievement in a group of 1,475 high school EFL learners in Singapore. In a study of general self-efficacy and speaking achievement, Oliver et al. (2005) also reported a very small positive correlation (r = .09, p >.05) for 275 Australian elementary school students studying a range of foreign languages.
While all of these studies provide important insights, the variation of effect sizes across studies is noteworthy and motivates the current study, which aims to contribute to a more comprehensive and nuanced understanding of the relationship between L2 self-efficacy and L2 achievement in SLA.

The current study
The current study is motivated by three goals. First, we perform a thorough and systematic literature search and analysis to examine existing trends in reporting practices in the L2 self-efficacy and achievement domain. Second, we report on both the direction and size of the relationship between learner self-efficacy and achievement. Lastly, we examine our sample for systematic variation of effects and investigate a number of moderator variables. In order to meet these goals, the present meta-analysis addresses the following research questions: 1) What reporting practices are used in studies exploring L2 self-efficacy and L2 achievement? (RQ1) 2) What is the direction and magnitude of the relationship between L2 selfefficacy and L2 achievement? (RQ2) 3) What is the moderating effect of L1/L2 proficiency level, learner age, institutional context (e.g., primary school, university), self-efficacy type (e.g., general, skill-specific), and achievement measure on the relationship between L2 self-efficacy and L2 achievement? (RQ3)

Study identification and inclusion criteria
In order to identify relevant studies to answer our research questions, a set of inclusion criteria was applied. To be eligible, a study had to (1) present a measure of both L2 self-efficacy and L2 achievement, (2) report quantitative results, either as a Pearson correlation or a statistic that can be converted into an r index (e.g., t or F), and (3) be published in or after 1997, to include the studies published since both Bandura's (1997) and Pajares's (1997) influential works and the emergence of positive psychology. Using the defined parameters and combinations of defined keywords in English: (1) self-efficacy, beliefs, self-esteem, mindset, self-concept, talent; (2) performance, achievement, outcomes; and (3) language learning, second language learning, foreign language learning, heritage language learning, we conducted a comprehensive search in two library-housed databases (ERIC, LLBA), and one public database (Google Scholar). We also conducted an ancestry search by reviewing the bibliographies of relevant studies and the publication lists of prominent researchers in self-efficacy and positive psychology in SLA, identifying a total of 640 publications.
Some studies were later excluded due to: (1) an absence of our target variables, (2) research designs that included neither L2 self-efficacy nor achievement instruments, (3) a qualitative methods or case study design, (4) missing data, such as correlations, means, and standard deviations, and (5) a self-efficacy construct unrelated to language learning, such as computer self-efficacy (Ale et al., 2017) or technological self-efficacy (Abdallah & Mansour, 2015), which measure beliefs regarding one's capability to use technology or digital tools in order to achieve one's goal. After applying all eligibility criteria, 37 studies published between 1999 and 2019 in 15 different countries were included in the final analysis, with a total N of 23,050 and sample sizes ranging from 32 to 11,036 (M = 581.9; SD = 1770.8).

Coding procedures
Each study was coded for a number of features according to a systematic coding scheme, which was adapted from an existing coding scheme of a recent meta-analysis of L2 anxiety (Teimouri et al., 2019). Specifically, we recorded study features according to five categories: (1) bibliographic details (e.g., authors), (2) study design (e.g., methods), (3) participant sample (e.g., target language), (4) instruments (e.g., self-efficacy scale), and (5) quantitative findings (e.g., effect sizes). To pilot the coding scheme, we each rated five studies from the sample and resolved any ambiguities in our results and coding scheme. The final coding scheme can be found in Appendix A. Next, each author coded half of the studies. Twenty percent of the sample was coded by both authors, establishing high inter-rater reliability (100%).

Analysis
To answer RQ1, we first identified all self-efficacy and achievement instruments. Next, we conducted frequency counts and examined the reporting tendencies for reliability and validity of instruments for each variable. By examining the psychometric properties, we followed the foundational principle of the methodological reform movement (Marsden & Plonsky, 2018), which aims for more rigorous reporting practices in the field of SLA through robust description and evaluation procedures.
To calculate the magnitude and direction of the relationship between selfefficacy and achievement (RQ2), we employed a fixed-effect model. That is, we assumed the existence of one common effect size across all studies in our sample and calculated both the weighted overall mean according to each study's sample size, and the 95% confidence intervals for each effect. The fixed-effect model was chosen a priori due to our small sample size. More specifically, after visually inspecting our coded studies, we assumed only a moderate amount of heterogeneity across our sample. That is, while the studies were not identical with regards to design features, such as their use of data collection instrument or participant recruitment procedure, we deemed them moderately homogenous in terms of participant characteristics, such as target language and institutional context. This assumption was later confirmed (I 2 = 68.647, p < .001), and we investigated the systematic variation of effect sizes within our model (Borenstein et al., 2010) in our moderator analysis (RQ3).
To address our last research question, we used the categorical variables in our coding scheme to form and analyze subsamples. Before running the analyses, we examined our data set for heterogeneity, calculating the goodness-offit statistic (Q) and checking for the existence of systematic variation among the observed effects in the sample (I 2 ). All analyses were performed using Comprehensive Meta-Analysis (CMA) Version 3 (Borenstein et al., 2013).

Results
The results concerning our first research question, which aimed to explore reporting practices, are summarized in Tables 1 and 2. Self-efficacy was measured by a total of 33 unique instruments, with two studies that did not provide details on their instrument. The most commonly referenced questionnaires were the Motivated Strategies for Learning Questionnaire (MSLQ;Pintrich et al., 1991), the Questionnaire of English Self-Efficacy (QESE; Wang, 2004;Wang et al., 2013), and the Self-Efficacy Questionnaire (SEQ; Sedighi et al., 2004). However, only a minority of our sample (k = 14; 36%) adapted any of these three instruments. Fifteen studies uniquely adapted instruments from other sources, and eight studies designed new instruments. Of the 39 times that studies reported on a measurement f o r L2 s elf -e f fi c ac y , r eli ab ili t y w a s r e po rt ed 3 7 t i m es (9 5 % ) in t he f o rm o f Cronbach's alpha coefficients, with the exception of two studies that, respectively, conducted a test-retest correlation and split-half test of internal consistency.
With respect to language achievement measures, our analysis found 39 different measures representing three main methods for assessing L2 achievement: language tests (k = 22), course grades (k = 7), and Grade Point Average (GPA) (k = 3). Seven samples implemented a range of other methods, including student self-reported exam grades, teacher ratings of student achievement, and task-based assessment ( Table 2). As shown in Table 2, 28 samples (72%) reported means, and 26 studies (66%) reported both the means and standard deviation for their instrument. Of the 22 samples implementing a language test, reliability coefficients were reported only eight times (36%): five reported Cronbach's alpha, one reported KR-20, one reported inter-rater reliability, and one reported test-retest correlation. The five samples reporting Cronbach's alpha yielded an average reliability of .93 (SD = .03).  Questionnaire (Pintrich et al., 1991); QESE -Questionnaire of English Self-Efficacy (Wang, 2004;Wang et al., 2013); SEQ -Self-Efficacy Questionnaire (Sedighi et al., 2004)  The second research question targeted the magnitude and direction of the relationship between self-efficacy and achievement. We first computed the weighted average mean of all 40 effects in our sample: r = .464 (95% CI [.454, .474]; p < .001), and then examined their distribution for outliers, which we define as any result with a standardized residual larger than 3 in absolute value. This resulted in the exclusion of 13 values, which increased our result slightly to r = .475 (95% CI [.464, .486]; p < .001). Additionally, the Q statistic changed from 1303.27 to 82.93, and the I 2 statistic changed from 97.01 to 68.65. This indicated a reduction of sampling error and a sizeable amount of real and systematic variation in the data, thus justifying a moderator analysis. Table 3 provides a summary of our statistical model. An overview of all effects can be found in Appendix B. Additionally, a funnel plot of the relationship between effect size and standard error (Figure 1) was created to examine the presence of publication bias in our sample. The plot includes both actual (i.e., hollow) and imputed (i.e., solid black) data points and suggests a slight publication bias in favor of studies that report strong, positive correlations between L2 self-efficacy and achievement.
The moderator analysis (Table 4) was conducted with 27 independent samples and calculated subgroup effects based on seven categories: age, learner L1, target language, institutional context, proficiency level, self-efficacy type, and achievement measure. The analysis for age showed weaker correlations for high-school age teenagers (16-18) and learners over 20 than for early high-school (15) and early college age students (19)(20). However, the majority of primary studies did not report age as a learner characteristic, leading to small subsamples that warrant cautious interpretation.

Figure 1 Funnel plot of effect sizes and sampling errors
With respect to the learners' L1 and proficiency level, a similar pattern of underreporting was found. Regarding learners' L1, some languages, such as Thai (r = .68), showed stronger relationships between L2 self-efficacy and achievement than others, for example Japanese (r = .31). In terms of proficiency level, the analysis indicated a gradual increase in relationship strength when moving from beginner (r = .44) to advanced levels (r = .62).
The analysis of self-efficacy type revealed the strongest relationship for speaking self-efficacy (r = .56) and the weakest relationship for vocabulary selfefficacy (r = .33). Other skill-specific types (i.e., reading and writing self-efficacy), as well as general L2 self-efficacy (i.e., no skill or content-specific subtype), all showed findings similar to the overall mean effect of r = .475. Regarding achievement measure, learners' GPA yielded a stronger correlation (r = .62) than course grades or language tests, which both returned the same effect (r = .47).
The analyses for other moderators did not indicate such ranges in effect sizes. For example, findings for target language revealed only a slightly higher effect for English (r = .53) than for non-English languages (r = .47), and institutional contexts showed similar effects across all subgroups (r = .47-.50).

Discussion
RQ1 explored reporting practices in L2 self-efficacy and achievement research. The majority of studies reported both basic descriptive statistics and reliability coefficients of their self-efficacy instruments, which all align with or exceed the average reliability estimates for instruments designed for SLA research (Plonsky & Derrick, 2016). Thus, we see rigorous reporting practices within this domain of research. With respect to language achievement measures, we found a weaker tradition of reporting practices. Only approximately one-third of the studies provided the reliability coefficient of their language achievement test. In terms of basic descriptives, only two-thirds reported both the mean and standard deviation for their sample. As noted in previous meta-analyses (e.g., Teimouri et al., 2019) and reviews of SLA research practices (e.g., Larson-Hall & Plonsky, 2015;Plonsky, 2013Plonsky, , 2017, this lack of reporting is a common limitation of SLA research. Furthermore, nearly half of the studies used instructor-or institutiondeveloped achievement tools, which often lack the psychometric properties that are essential for robust quantitative analysis in SLA (Brown et al., 2018). Altogether, these results highlight the need for greater validation of instruments and more rigorous reporting practices on language achievement measures in SLA (Marsden & Plonsky, 2018). RQ2 investigated the direction and magnitude of the relationship between learners' self-efficacy and their L2 achievement. The mean correlation between these variables was r = .475, and self-efficacy accounted for approximately 22% of the variance in learners' achievement, signifying a medium effect in SLA (Plonsky & Oswald, 2014). Within the context of the positive turn in SLA, it is insightful to interpret our result in comparison to meta-analytical findings of other prominent psychological constructs, such as motivation, which has been paradigmatically categorized as playing a positive role in student learning. Somewhat surprisingly, Al-Hoorie's (2018) recent meta-analysis of the L2 motivational self system and L2 achievement found smaller mean correlations between learners' L2 achievement and their L2 ideal self (r = .20), ought-to self (r = -.05), and learning experience (r = .17). Considering the close relationship between learners' ideal selves and self-efficacy beliefs (Lake, 2013), one might have expected a similar effect size for both variables. Instead, finding a larger positive effect for learner beliefs raises questions about the complexity of the relationship between learner beliefs and motivations in the context of L2 achievement. While studies in the positive psychology paradigm in SLA have predominantly examined a host of emotional variables (Driver, 2021), our findings furthermore highlight the potential of examining the role of cognitive variables and their interaction in student thriving and learning success in future studies.
It is also helpful to interpret our result in comparison to meta-analytic findings in neighboring disciplines. In psychology, Multon and associates (1991) found a more moderate effect of r = .38 (95% CI [.36, .41]) through their metaanalysis of self-efficacy beliefs and general academic outcomes, while Huang's (2016) findings in education regarding the effect of self-efficacy on general achievement goals align more closely with our result (r = .48; 95% CI [.38, .46]). The similarity of Huang's finding raises questions regarding the underlying theoretical assumptions of the self-efficacy and achievement relationship in SLA (Wyatt, 2018), considering Huang's conceptualization of achievement goals as beliefs about achievement objectives rather than as a measurable performance outcome. Put differently, Huang's study examined the relationship between two separate beliefs, rather than between a belief (e.g., self-efficacy) and an outcome (e.g., achievement measure), arguably capturing more accurately Bandura's (1977) theorization of the relationship between self-efficacy and achievement ( Figure 2). In contrast, most studies in our sample employed self-efficacy definitions by Bandura (1986Bandura ( , 1997 but used correlational designs to investigate the link to outcomes, thereby conflating multiple cognitive and behavioral constructs into one variable. Therefore, our estimate of the mean effect must be interpreted with caution, as neither the different types of beliefs nor the behaviors of participants were controlled for in the primary studies.

Learner
Behavior Outcome

Figure 2
Representation of the self-efficacy and achievement relationship (Bandura, 1977) RQ3 examined the effects of moderating variables. The results of this analysis showed a number of differences in effects sizes with respect to learner characteristics, language, and target skill (e.g., speaking vs. writing). However, considering the overall small sample size of this moderator analysis (N = 27) and the resulting limited number of studies in each analytical subcategory, caution should be taken when interpreting these findings.
Perhaps most notable were the findings for L2 proficiency, which showed strongest effects for advanced students and weakest for beginners. For self-efficacy to positively impact L2 achievement, learners must not only believe in their abilities, but these beliefs must also have a direct effect on how learners engage with a task, that is, through increased levels of attention and awareness, implementation of task-appropriate learning strategies, or stronger feelings of L2 grit,

Self-efficacy beliefs
Outcome beliefs all of which have been seen to result in better learning outcomes (e.g., Teimouri et al., 2020). Higher proficiency learners likely have more experience with how positive beliefs can motivate learning processes that lead to greater L2 achievement, as well as the linguistic resources to bridge the gap between simply believing in the ability to complete a task and actually successfully accomplishing the task. Results also revealed differences across learners' L1, which may be related to questions around L2 accessibility and the relative "prestige" and utility of the L1 in global settings. Self-efficacy effects were strongest for L1 Thai speakers and weakest for L1 Japanese speakers. Given the value of Japanese in the worldwide business market and the socioeconomic developments in Japan (Terasawa, 2017) that have increased access to the L2 (i.e., English), L1 Japanese speakers may be motivated by factors other than self-efficacy and have less urgency to meet achievement objectives than other L1 speakers. In contrast, L1 Thai speakers encounter fewer opportunities and encouragement to use L2 English in Thailand (Anyadubalu, 2010) but, at the same time, may also be more dependent on an L2 in order to enter into global discussions, economics, and business partnerships. Thus, some L1 speakers in regions with less access to L2 resources are likely to rely more heavily on their self-efficacy for affecting achievement, which aligns with the relative effects and L1 profiles in our findings (e.g., Arabic; Aljaffery, 2015).
Similarly, given the "prestige" of English as a global language, L1 English speakers may also receive less support for learning an L2 and rely more heavily on their own self-beliefs for realizing L2 achievement goals. In addition, we found slightly weaker effects for L2 learners of English than for non-English languages, which strengthens our argument. L2 learners of English likely find primary motivations (e.g., ought-to L2 self; Ushioda & Dörnyei, 2011) from outside sources based on the global instrumentality of the English language. In contrast, learners of other languages likely see fewer motivations and obligations from their society to learn the L2, resulting in a stronger connection between self-efficacy and achievement.
Interestingly, results also indicated weaker effects for learners' beliefs about their L2 vocabulary knowledge than general or skill-specific self-efficacy (e.g., reading, writing, or speaking), suggesting that learners' beliefs in their ability to complete open-ended rather than narrowly focused tasks (i.e., on vocabulary or grammar) may have a greater impact on achievement. The strongest effect was found for speaking self-efficacy. Because speaking requires learners to produce language without the opportunity to review and revise before the final product (as is the case with both reading and writing), self-efficacy in this area suggests that learners have a strong belief in their ability to spontaneously and accurately organize and communicate their thoughts. Thus, speaking selfefficacy may encompass self-beliefs in a wider number of task-related abilities, leading to stronger effects for ultimate L2 achievement.
Finally, the positive effects of self-efficacy were uniform in relation to language tests and course grades, which suggests both as acceptable metrics of achievement in this domain. However, the much larger effect for GPA raises concerns about the weak ecological validity of GPA as a measure of achievement. Still, the subsample for this moderator was small and this finding should be taken with caution. Other moderators revealed either no differences (i.e., institutional context) or no clear variation patterns (i.e., age). Though a smaller effect size was seen for the 16-year-old age group, the self-efficacy-achievement connection for 15-year-olds was similar to the average effect size for collegeaged learners (i.e., ages 18-21), and, due to underreporting of this learner characteristic, it is not clear if studies explored other age groups within the secondary or primary school context. The question whether or not the effects of L2 selfefficacy may be more beneficial for adults in comparison to younger learners, or perhaps for contextually younger students compared to older students, who tend to be closer to a graduation and perhaps reap fewer benefits of believing in their abilities, will require more research exploring a wider range of age groups, as well as robust reporting practices.

Limitations, directions for further research, and pedagogical implications
Some important limitations need to be considered when interpreting the findings of our study. First, small subsamples in our moderator analysis can only provide scant evidence for the role that our selected variables play in the systematic variation of the effect size. Second, we did not include unpublished studies or those published in languages other than English in our final sample, which may have introduced multiple biases into our analysis that possibly resulted in an overestimation of the strength of the observed relationship (Cooper, 2016). Lastly, we also excluded two studies that implemented advanced statistical methods, such as βcoefficients in structural equation modeling (SEM) to report on the relationship for self-efficacy and achievement. As a result, we are not able to provide evidence for existing trends or effect size variation for different analytical approaches.
We encourage future studies to carefully examine the theoretical underpinnings of the self-efficacy and achievement relationship. Bandura's (1986) frequently used definition distinguishes self-efficacy beliefs from beliefs about outcomes, which some scholars have defined as self-confidence (e.g., Karademas, 2006), thereby adding not only a behavioral, but also an emotional component to the theoretical conceptualization of the relationship. The use of conceptual frameworks that allow for the holistic examination of self-efficacy in relation to other psychological dimensions, such as emotions, behaviors, and motivations, simultaneously within cultural, societal, and institutional contexts (e.g., Ushioda, 2014) would move future studies away from cross-sectional bivariate analyses of variables toward system modeling that is able to capture variable change over time. The moderator analysis also revealed the need for research with populations learning other second or foreign languages than English and with K-12 learners. Additionally, our findings suggest that SLA would benefit from future studies that focus specifically on skill-specific self-efficacy beliefs. Lastly, we recommend more rigorous reporting practices in line with field-specific standards and reaffirm calls to improve the reporting of statistical and psychometric features of measurement tools, especially for L2 achievement.
In terms of FL pedagogy, the findings suggest that increased self-efficacy in the target language is beneficial for language learning, regardless of age or context. Given this positive relationship, educators might consider using methods that have been shown to promote self-efficacy, such as the flipped classroom model (Namaziandost & Çakmak, 2020), and project-based learning (Shin, 2018). Both of these methods have shown positive effects on learners' self-efficacy and motivation, and may have a positive influence on their future achievement objectives. Educators should consider preparing classroom activities, particularly relating to speaking, that feed into students' self-efficacy and nurture students' beliefs about their abilities to complete tasks in the L2, which will depend on learners' experiences with the language and their existing knowledge and skills. As our findings suggest, practitioners may find that materials designed to raise L2 self-efficacy are particularly beneficial for students at higher L2 levels. Importantly, the methods used in a majority of studies in our sample were not intended to replicate the classroom learning experience, and pedagogical implications should be considered with other well-founded pedagogical methods from SLA.

Conclusion
The goal of this meta-analysis was to systematically examine the relationship between L2 self-efficacy and achievement. The findings indicate that self-efficacy has a medium-sized, positive relationship with L2 achievement, and suggest that learning success may indeed be a question of believing in one's abilities during the learning process. Furthermore, the findings raise important questions about the theoretical nature of self-efficacy and the role of learner characteristics and contexts, and highlight the need for greater methodological rigor, all of which can serve as starting points for forthcoming empirical research into self-efficacy and achievement in the field of SLA, particularly within the framework of positive psychology.