The L 2 motivational self system : A meta-analysis

This article reports the first meta-analysis of the L2 motivational self system (Dörnyei, 2005, 2009). A total of 32 research reports, involving 39 unique samples and 32,078 language learners, were meta-analyzed. The results showed that the three components of the L2 motivational self system (the ideal L2 self, the oughtto L2 self, and the L2 learning experience) were significant predictors of subjective intended effort (rs = .61, .38, and .41, respectively), though weaker predictors of objective measures of achievement (rs = .20, -.05, and .17). Substantial heterogeneity was also observed in most of these correlations. The results also suggest that the strong correlation between the L2 learning experience and intended effort reported in the literature is, due to substantial wording overlap, partly an artifact of lack of discriminant validity between these two scales. Implications of these results and directions for future research are discussed.


Introduction
In 2005, Dörnyei introduced the L2 motivational self system (L2MSS) as an attempt to explain individual differences in language learning motivation.The L2MSS is influenced by a number of theories, most notably possible selves theory (Markus & Nurius, 1986), self-discrepancy theory (Higgins, 1987), and the socio-educational model (Gardner, 1979(Gardner, , 1985(Gardner, , 2010)).A fundamental assumption in the L2MSS is that when the learner perceives a discrepancy between their current state and their future self-guide (i.e., ideal or ought), this discrepancy may function as a motivator to bridge the perceived gap and reach the desired end-state.In 2009, the first anthology testing this model appeared (Dörnyei & Ushioda, 2009b) reporting a number of empirical investigations that, according to Dörnyei (2009), "found solid confirmation for the proposed self system" (p.31).
Subsequently, interest in this model increased exponentially in the language motivation field.Within just one decade, the L2MSS generated "an exceptional wave of interest with literally hundreds of studies appearing worldwide" (Dörnyei & Ryan, 2015, p. 91).In fact, in their comprehensive survey of over 400 recent publications, Boo, Dörnyei, and Ryan (2015) report that the L2MSS is currently the dominant theoretical framework in the field.Boo et al. (2015) attribute this dominance to the versatility of the model and its ability to accommodate a wide range of perspectives from different theoretical orientations.
The L2MSS consists of three main components (Dörnyei, 2005(Dörnyei, , 2009)): the ideal L2 self, the ought-to L2 self, and the L2 learning experience.The ideal L2 self refers to the state one would ideally like to reach, thus representing one's own hopes and wishes.The ought-to L2 self, on the other hand, refers to the state that others would want one to reach, thus representing the expectations projected by significant others.On a different level, the L2 learning experience concerns one's experience in the immediate learning environment, involving aspects such as the teacher, the curriculum, and peers.The next section reviews the evidence each of these three components has generated.

The ideal L2 self
The ideal L2 self has received a significant amount of attention in recent literature.However, the results seem to have led to a range of conclusions in the field, some of which seem polarized.On the one hand, the predictive validity of the ideal L2 self has been described as "straightforward" (Dörnyei & Ushioda, 2011, p. 87), and as providing "solid confirmation" (Dörnyei, 2009, p. 31) in that "the emerging picture consistently supports [its] validity" (Dörnyei, 2014, p. 521).Similarly, Dörnyei and Ryan (2015) argue that "virtually all the validation studies reported in the literature found the L2 Motivation Self System providing a good fit for the data" (p.91).Ghanizadeh and Rostami (2015) further state that the "results conclusively verified the model in virtually every context."These comments generally refer to the ideal L2 self specifically (see also Ghanizadeh, Eishabadi, & Rostami, 2016, p. 15;Henry & Cliffordson, 2015, p. 20;Islam, Lamb, & Chambers, 2013, p. 238;Teimouri, 2017, p. 683).
On the other hand, some other researchers expressed some reservation.For example, in their investigation of Korean secondary school students, Kim and Kim (2011) report that the ideal L2 self could not predict school grades.The researchers note that "being motivated by developing a vivid ideal L2 self through a dominant visual preference seems to be irrelevant to the level of academic achievement" (p.36).Similarly, Lamb (2012) administered a C-test to Indonesian learners and found, again, that the ideal self could not predict proficiency.He therefore argued that although his participants "would like to see themselves as future users of English (ideal L2 self), what makes them more likely to invest effort in learning is whether they feel positive about the process of learning" (p.1014).In the Canadian context, MacIntyre and Serroul (2015) examined the relationship between the ideal L2 self and actual L2 performance in their idiodynamic paradigm, which measures individual motivational variability on a persecond timescale.The researchers found "no evidence" (p.126) that the ideal L2 self is dynamic or adapting to the changing task demands.In the Iranian context, Papi and Abdollahzadeh (2012) also found that the ideal L2 self does not predict actual classroom behavior.The researchers explain that: the learners' ideal image of their future self does not have much impact on their motivated behavior in English language classrooms or vice versa; that is, regardless of how well-developed the students' ideal L2 self is, their actual motivated behavior in classroom activities will remain unaffected, and regardless of how motivated the students are in class, their ideal L2 selves will remain unchanged.(Papi & Abdollahzadeh, 2012, p. 588) In the Saudi context, Moskovsky, Assulaimani, Racheva, and Harkins (2016) found the ideal L2 self to be a negative predictor of language proficiency.The researchers argue that, overall, the results "at best indicate a tenuous link between the self guides and achievement" (p.650).
Thus, the emerging literature points to a rather complex picture.This could plausibly due to certain factors, such as applicability of the model to different contexts or participants, or the use of different outcome measures.As explained in more detail below, a meta-analysis can help shed more light on such conflicting results.

The ought-to L2 self
In contrast to the controversy surrounding the ideal L2 self, there seems to be more agreement that the ought-to L2 self could benefit from some improvement.For example, Dörnyei and Chan (2013) acknowledge that "while [ought-to selves] do play a role in shaping the learners' motivational mindset, in many language contexts they lack the energising force to make a difference in actual motivated learner behaviours by themselves" (p.454).They then go on to explain that "while the participants perceived the external pressures on them as being valid and did intend to adjust their behavior accordingly, this intended effort was not manifested in their actual grades" (p.454, original emphasis).
In recognition of the wanting nature of the ought-to L2 self construct, a number of developments have been proposed.Most of these developments argue for the need to incorporate the distinction between own and other standpoints in both the ideal and ought-to L2 selves.From this perspective, the ideal L2 self should be separated into two constructs, one representing one's own hopes and one significant others' hopes.Similarly, the ought-to L2 self should be bifurcated into obligations one would like to perform and obligations others expect one to perform (see Papi, Bondarenko, Mansouri, Feng, & Jiang, in press;Taylor, 2013).
For example, Thompson and Vásquez (2015) conducted a narrative study on three language teachers and argued that their data indicate a distinction between an ought-to L2 self and an anti-ought-to L2 self, the latter referring to one's own desires that are at odds with what the others expect from the individual.Lanvers (2016) conducted another qualitative study on language learners and argued that the ought-other standpoint should feature more prominently in educational contexts, as parents and teachers typically exert a lot of influence on students.
In one of the few quantitative studies testing the relevance of own-other standpoints to the language learning context, Teimouri (2017) developed questionnaire scales to measure each of the four proposed constructs: the ideal-own, idealother, ought-own, and ought-other.Interestingly, Teimouri found support for the distinction between own and other in the case of the ought-to L2 self, but not the ideal L2 self.Teimouri argued that ideals are highly internalized, and consequently they may not be separable into those that relate to one's own versus others' ideals.
However, in order to be able to evaluate the contribution of these developments and the extent to which they have advanced the original construct, it is important to have a frame of reference.That is, without quantifying the predictive validity of the original ought-to L2 self, it may not be immediately apparent how much of an improvement an alternative variation of this construct is.A meta-analysis can offer a baseline against which the effectiveness of reformation attempts can be evaluated.

The L2 learning experience
This construct has been variously labeled as 'the L2 learning experience' and as 'attitudes toward language learning.'All these terms refer to the same construct because of the considerable overlap in the scales used to measure them (cf.You, Dörnyei, & Csizér, 2016, pp. 96-97).The L2 learning experience operates on a different level from either the ideal L2 self or the ought-to L2 self.Unlike them, the L2 learning experience is concerned with attitudes and evaluations of the present learning environment rather than a future-oriented self-guide.However, due to the increasing interest in self-guides in recent years (cf.Boo et al., 2015), very little attention has been paid to this construct.For example, Dörnyei describes the L2 learning experience as the situated, executive motive (Dörnyei, 2009, p. 29) and as the causal dimension (Dörnyei, 2005, p. 106) of the model.Beyond that, very little work has been done to clarify the role of such executive motives or the mechanisms that underlie their causal effect, making it the least theorized construct in the L2MSS (Ushioda, 2011, p. 201).Despite that, the L2 learning experience has been described as the strongest predictor in the L2MSS (e.g., Lamb, 2012;Teimouri, 2017).
Interestingly, the vast majority of studies testing this construct in our field has been observational.The standard design involves administering a questionnaire scale to learners and then examining the relationship (e.g., using correlation, regression, or structural equation modeling) between scores from this scale and from other criterion measures.However, this approach is prone to confounds, thus risking obtaining spurious results that do not underlie a genuine causal relationship.Beleche and colleagues point out the need for caution in interpreting observational studies: The positive association between grades and course evaluations may also reflect initial student ability and preferences, instructor grading leniency, or even a favorable meeting time, all of which may translate into higher grades and greater student satisfaction with the course, but not necessarily to greater learning.(Beleche, Fairris, & Marks, 2012, p. 709) Other potential factors shown to confound course evaluations include the teacher's age, ethnicity, gender, and even clothes and attractiveness (for reviews, see Ottoboni, Boring, & Stark, 2016;Stark & Freishtat, 2014).In fact, results by Ambady and Rosenthal (1993) show that students, simply after watching a very brief silent video (less than 30 seconds), form impressions about their teachers and that these first impressions then predict end-of-course evaluations.The presence of all of these biases has led some researchers to cast serious doubt on the value of course evaluation, with some considering any attempt to statistically adjust for the many biases involved to be practically "impossible" (Ottoboni et al., 2016, p. 10).
When it comes to experimental research, a number of educational studies conducted in different parts of the world -including Italy (Braga, Paccagnella, & Pellizzari, 2014), France (Boring, 2015), and the United States (Arbuckle & Williams, 2003;Carrell & West, 2010;MacNell, Driscoll, & Hunt, 2015) -have demonstrated that student satisfaction with the course is biased (based on objective measures).The results of these studies also cast doubt on any clear (positive) causal relationship between satisfaction with the course and achievement.In fact, some of them found a negative relationship between satisfaction and success in subsequent, more advanced courses.For example, results by Braga et al. (2014) show that "teachers who are more effective in promoting future performance receive worse evaluations from their students" (p.81).
In the present study, an attempt is made to meta-analyze the relationship between the L2 learning experience and language learning outcomes.The results are then used as a springboard to discuss the implications of results from observational studies and compare them to those from experimental studies.

Need for meta-analysis
A rigorous evaluation of a theory requires a systematic review of its accumulating literature.When sufficient quantitative reports become available, their results may be synthesized in a meta-analysis.A meta-analysis typically aims to estimate the magnitude (and confidence intervals) of the reported effect sizes, while moving away from a dichotomous significant versus non-significant outcome.A meta-analysis can also be helpful in shedding light on conflicting results.That is, it is plausible that conflicting results might to some extent be explainable by certain characteristics of different studies, such as type of participants, research design, or instruments used.For example, the literature on the ideal L2 self has drawn from different measures to date.Some researchers used subjective self-reports (i.e., intended effort), while others used more objective criteria (e.g., school grades and other achievement tests).It is plausible that different measures lead to different results.When used to test such hypotheses, a metaanalysis can potentially contribute to resolving debates in the literature.

The present study
Despite the growing number of studies drawing from the L2MSS, no systematic meta-analysis has been conducted on this literature to date.Instead, previous researchers have so far engaged in head-counting, such as tallying the number of published studies (e.g., Boo et al., 2015); or in vote-counting, such as describing the results of these studies as either supporting the theory or as 'mixed' (e.g., Dörnyei & Chan, 2013).Describing findings as mixed does not inform the reader about their average estimate, the width of its confidence interval, and whether any heterogeneity (i.e., variability of the estimate) found can or cannot be explained by moderators.Because a meta-analysis can address these questions, the present study aimed to meta-analyze studies drawing from the L2MSS.More specifically, the primary research question guiding this meta-analysis is as follows: RQ.What is the correlation between each of the three components of the L2MSS (the ideal L2 self, the ought-to L2 self, and the L2 learning experience) and educational outcomes (subjective and objective measures)?
This research question indicates a total of six correlations to be investigated: three correlations with subjective measures and three with objective measures.Categorizing outcome measures into subjective and objective was a rather pragmatic decision due to, as is explained in more detail below, the scarcity of studies utilizing objective measures in the field.The vast majority of studies in recent literature have used intended effort as their primary criterion variable.However, objective measures of actual language learning and achievement (e.g., grades and other standardized tests) represent an indispensable part of the overall picture.For example, Roth et al. (2015) argue that "school grades are crucial for accessing further scholastic and occupational qualification, and therefore, have an enormous influence on an individual's life" (p.118).Similarly, Moskovsky et al. (2016) claim that "therein lies the real test for the theory -in the capacity of the self guides to predict L2 achievement" (p.643; see also Dörnyei & Chan, 2013, p. 454;Dörnyei & Ryan, 2015, p. 101).Indeed, language proficiency and achievement are an essential consideration for many stakeholders such as parents, teachers, and future employers.Still, arguing that objective measures are 'the real test' of a theory might imply downplaying subjective measures, when in fact subjective measures might plausibly capture a dimension not captured by objective criteria.For completeness, therefore, the correlation between the two outcome measures was investigated to find out the degree of correspondence between them.

Inclusion criteria
In order to be eligible for inclusion in this meta-analysis, the report must satisfy the following criteria: 1.It must involve a quantitative component.Qualitative and conceptual articles were excluded.2. It must be about language learners.Reports about language teachers were excluded.3. It must include at least one of the three components of the L2MSS.

It must include at least one outcome variable, such as school grades,
objective tests, or subjective intended effort.5.It must report the zero-order correlation between at least one component of the L2MSS and one outcome measure, or provide sufficient information to calculate it.Studies with only regression coefficients were excluded.6.It must be published in English.7. It must have been available by the start of June 2017.

Literature search
The literature search commenced with the article pool compiled by Boo et al. (2015), spanning the period from 2005 to 2014 (k = 283, excluding book chapters).
To complement this list and to find more recent reports, a search was conducted in databases relevant to our field: ERIC, LLBA, MLA, ProQuest, and PsychINFO using the following keywords: ideal L2 self, ought-to L2 self, L2 learning experience, and L2 motivational self system.This resulted in a number of additional journal articles and unpublished theses (k = 51).The list was then complemented by a Google Scholar search and by an ancestry search to ensure saturation (k = 21).Furthermore, 19 edited volumes published since 2005 were inspected (k = 309 chapters).Finally, a call for papers was announced at various relevant mailing lists, including BAALmail, Linguist List, myTESOL Lounge, Korea TESOL, and IATEFL Research SIG, as well as social media -resulting in further reports (k = 14).
This search procedure has therefore resulted in a pool of 678 journal articles, book chapters, and unpublished manuscripts, ranging from conceptual to empirical, quantitative and qualitative, as well as duplicates (e.g., theses that were later turned to one, or more, publications).This pool of reports was subsequently examined against the inclusion criteria listed above.Eventually, 32 reports involving 39 unique samples and 32,078 language learners met all inclusion criteria.The lists of the included studies and of their characteristics are available in Appendices A and B.

Data analysis
Software.Comprehensive Meta Analysis 3.3 (Borenstein, Hedges, Higgins, & Rothstein, 2014) was used for all analyses.A random-effects model was implemented, since there was no reason to assume that all studies share one common effect size.Heterogeneity was examined using the I 2 -statistic and its associated significance value.The presence of significant heterogeneity implies that the effect is highly variable and could potentially be explained by certain characteristics of different studies.
Publication bias.Publication bias refers to the situation where the outcome of a study has an effect on whether that study is eventually published.Studies reporting statistically significant results tend to be perceived as more interesting than those reporting non-significant results, and therefore the latter may not successfully complete the long and laborious publication process.The authors themselves can also become discouraged or lose interest, and consequently decide not to undergo the publication process.In some cases, the authors may believe that there must have been a mistake, especially when their results are not in line with mainstream views.This can lead to what is known as the file drawer problem (Rosenthal, 1979).
Publication bias may be inferred when small-scale studies, with statistically lower precision, report extreme values relative to larger-scale studies.Due to their lower power, some small studies are expected to find non-significant results simply by chance.However, when such small studies report significant results consistently, the likelihood that the literature is significant-biased increases.In the present meta-analysis, publication bias was examined using the Trim and Fill method (Duval & Tweedie, 2000a, 2000b).The Trim and Fill method is currently the most popular corrective technique to adjust for publication bias in contemporary meta-analytic literature (Simonsohn, Nelson, & Simmons, 2014).
Inclusion criteria.Initially, a second coder analyzed 10% of the reports independently against the inclusion criteria described above (Cohen's ᴋ = .76,p < .001).Subsequently, discrepancies were resolved by discussion until 100% agreement was reached.Very few studies reported longitudinal investigations (k = 1).In this case, the first time point was included.Also very few studies reported two measures for the L2 learning experience (k = 1) or intended effort (k = 1).In these cases, the two measures were averaged before inclusion in the analysis.
Most studies adopted the standard research design of administering questionnaire scales adopted with minor variations from Taguchi, Magid, and Papi (2009), typically translated to the participants' L1.Some reports were excluded for not reporting the results for Pearson correlation, such as instead reporting regression coefficients (k = 13), the path coefficients in structural equation models (k = 11), or other procedures (k = 2).However, over 90% of these reports used intended effort as their criterion measure.Due to the relatively large number of reports drawing from intended effort that are already eligible for inclusion in the present meta-analysis, the excluded reports would have probably had a minor impact had they been included.This issue is discussed further in the Limitations section below (see Appendix C for a list of studies excluded for incomplete reporting).
Published versus unpublished reports.Unpublished reports are typically included in meta-analyses (Norris & Ortega, 2006).Although unpublished studies raise quality concerns, they may also represent studies with null results or with results going against mainstream views -making them harder to publish.Other reports may have been completed as part of a degree program (e.g., MA or PhD) and publication was not subsequently pursued.
In the present meta-analysis, there were a number of unpublished reports (k = 6).As a quality control procedure, moderation analysis was conducted to compare the results obtained from published and unpublished reports.The results showed that all comparisons were non-significant at the .05level, thus providing no evidence that this small sample of unpublished reports have biased the results.
Study quality.Study quality is a perennial problem in meta-analysis, since low quality studies could potentially bias the results.While some researchers advocate excluding low quality studies altogether, others recommend including them and then conducting sensitivity analysis (e.g., Norris & Ortega, 2006).This is partly because study quality is not a straightforward concept, and different researchers may evaluate quality differently.Sensitivity analysis, however, can show whether the overall results are robust or highly influenced by the presence of studies with debatable quality.
In the present meta-analysis, the target statistic was Pearson correlation.Because this is a relatively straightforward procedure, it was expected that most reports would exhibit satisfactory quality.Following guidelines outlined by Dörnyei (2010), particular attention was also paid to psychometrics, such as using multi-item scales, providing suitable response options, and reporting reliability.All reports satisfying the inclusion criteria were analyzed by two coders independently (Cohen's ᴋ = .87,p < .001).Discussion of the minor discrepancies obtained led to the conclusion that a small number of reports (k = 2) might potentially bias the results as the reliability of individual scales was missing.Sensitivity analysis was therefore conducted to examine the effect of excluding these two reports (see Results below).
Subjective versus objective outcomes.In the present sample, a large number of studies used intended effort as their criterion variable.In fact, even subjective self-ratings of proficiency can hardly be found in the literature.A smaller number of studies used more objective measures, including school grades and proficiency tests.Moderation analysis was conducted to compare the results obtained from school grades and from other objective measures.All tests were non-significant, thus justifying combining grades and objective measures into one category (called "achievement" henceforth). 1urther moderators.Unfortunately, it was not possible to test the moderating effect of some important learner characteristics, including age, gender, and context.In terms of age, not a single study involving pre-secondary learners qualified for the final analysis, supporting Boo et al.'s (2015) observation that there is a "virtual absence" (p.156) of research on younger learners in recent years.A few studies reported results for secondary and university learners separately, but the literature does not seem mature enough to meta-analyze the role of this variable since it was not always clear whether the target language was learned as part of a major or elective course or as an L2 or L3.In terms of gender, most studies reported the results combined for males and females, thus precluding any comparisons between the two genders.In terms of context, most investigations were conducted in a foreign language context, and only a small minority were in a second language context (k = 5, 3 of which were unpublished dissertations).Finally, a very small number of studies investigated a language other than English (k = 3), supporting Dörnyei and Al-Hoorie's (2017) argument that the language motivation field is currently English-biased.Implications of these trends are discussed later.

Results
Table 1 reports the correlations between each of the three components of the L2MSS and the two outcome measures, as well as those between the two outcome measures themselves.In all cases, a sizable number of learners were included, with the smallest total being over 1,300.It is further evident from Table 1 that considerably fewer studies included a measure of actual achievement, while most used intended effort as their primary outcome variable.Note.Exp = experience, ns = non-significant.Sensitivity analysis excluded two reports (n = 171 total).
The three components of the L2MSS had positive correlations with intended effort, but dropped with achievement.There was also no overlap in the confidence intervals of each component's correlations with intended effort and with achievement, indicating that the coefficients are significantly different from each other.These findings might be used to explain some conflicting results in the literature: Researchers who used subjective measures found stronger support for the L2MSS than those who used objective measures.Furthermore, the correlation between intended effort and achievement was weak and non-significant, indicating that these two outcome measures cannot be used interchangeably.A stark illustration of this is found in the ought-to L2 self, where its correlation with Intended effort was positive and moderate in magnitude (.38), but reversed its sign with achievement (-.05).These findings point to the need to diversify outcome measures in the L2 motivation field to obtain a more comprehensive picture, rather than relying exclusively on intended effort.
The I 2 values in Table 1 indicated that there was a wide and significant heterogeneity in most correlations.That is, with the exception of the one between the ought-to L2 self and achievement (which is non-significant), all other correlations exhibited heterogeneity in excess of 85% and higher.Some confidence intervals were also somewhat large, especially for the correlations between achievement and each of the ideal L2 self and the L2 learning experience.Such heterogeneity is to be expected since these studies were conducted in different parts of the world by different researchers working independently rather than adhering strictly to certain research protocols.Potential moderators might help explain this heterogeneity in future meta-analytic research when a sufficient pool of studies becomes available.
When it comes to sensitivity analysis, the two reports that were excluded for not reporting scale reliability happened to involve correlations with achievement only.The results after excluding these two reports are found in the 'sensitivity' rows in Table 1.The three correlations with achievement exhibited a minor drop, with that of the L2 Learning experience becoming no longer significant.When it comes to publication bias, adjusted values are reported in the 'corrected' rows in the table.Two correlations dropped to non-significance due to publication bias correction: the correlation between achievement and each of the ideal L2 self and the L2 learning experience.These two cases had relatively low sample sizes, suggesting a larger sample of studies utilizing objective measures is needed to obtain a more robust finding.It may also potentially suggest that there are further reports that show non-significant results but that could not be uncovered by the literature search of this study, despite the relatively generous inclusion criteria adopted (by including unpublished reports and book chapters) and a call for papers circulated widely in the field.Figure 1 presents a visual illustration of publication bias in the case of the ideal L2 self.
One surprising finding in Table 1 is the unusually high correlations of intended effort with the L2 learning experience.According to Dörnyei (2007), "if two tests correlate with each other in the order of 0.6, we can say that they measure more or less the same thing" (p.223).While this may not be a hard-and-fast rule, the high correlations in Table 1 do raise discriminant validity concerns.This part of the analysis was therefore rerun to compare studies that examined the factorial structure of their scales (whether using classical test theory or item response theory) with those that did not.
The results in Table 2 indeed provide evidence that the high correlation between the L2 learning experience and intended effort may be a methodological artifact of not applying a factor-analytic procedure.The correlation between these two variables showed a significant drop from .68 to .41.A cursory look at the items used in studies that did not examine the factorial structure of their scales also showed considerable overlap.For example, one report used these two items: "Learning English is one of the most important aspects in my life" and "It is extremely important for me to learn English."Despite the close similarity of these two items, the former was used to measure attitudes toward learning English while the former intended effort.It is highly unlikely that these two items belong to two different latent variables.Unsurprisingly, that study reported a correlation of .91 between them for university majors, indicating that it may not be meaningful to distinguish between these two scales.

Discussion
The present meta-analysis has revealed a number of trends.One is that, perhaps for convenience, there is an abundance of research using intended effort as the primary criterion variable in recent language motivation research.On the other hand, there is a shortage of other outcome variables, resulting in an incomplete picture in the literature -especially since there was hardly any relationship between intended effort and other objective measures (r = .12).Future research should attempt to draw from more diverse criterion measures in the hope of shedding more light on the multifaceted nature of motivation.
Another trend in recent literature is the lack of sufficient attention to important learner characteristics.More specifically, the present meta-analysis could not examine the effect of age, gender, or context.As for age, although older learners tend to be more accessible to researchers, it is possible that the dynamics of motivation is different at different ages (Kormos & Csizér, 2008).What motivates a 7-year-old might not motivate a 17-year-old (Nikolov, 1999).As for gender, it is often taken for granted that females exhibit higher motivation than males (You et al., 2016).However, systematic research to test this assumption is lacking, let alone attempting to explain it.As for context, the vast majority of recent motivation research has been conducted in foreign language contexts.This is in stark contrast to the social-psychological era, during which research in second language contexts was dominant (Al-Hoorie, 2017b).Hence, little is currently known about the applicability of self-guides to second language contexts (see also Dörnyei & Ushioda, 2009a, pp. 352-353, for a similar argument).A further trend is the dominance of English as the target language in recent research.English is indeed the global language and the most commonly taught nowadays.However, its global status may make the motivation to learn it distinct from the motivation to learn other languages (Dörnyei & Al-Hoorie, 2017).For example, a decision to learn a language like Danish or German typically needs to be accompanied by strong or personal reasons, especially when the aim is to achieve high proficiency.Learning English, in contrast, hardly needs a justification.This suggests a qualitative difference in the motivation to learn English versus the motivation to learn other languages.If this is the case, then the emphasis on English in recent literature risks deriving an incomplete theory of language learning motivation.This is an especially challenging task since the study of non-English languages is a rather complex subject.Non-English languages fall on different varieties such as minority, heritage, indigenous, and endangered languages, each with its unique set of contextual factors and conditions (Duff, 2017).
The following sections discuss the results of the present meta-analysis in relation to self-guides, the L2 learning experience, and intended effort.Limitations of this study are then highlighted.

Self-guides
In terms of the ideal L2 self, the results of the present meta-analysis showed that it correlated at .61 with intended effort and at .20 with achievement.In other words, the ideal L2 self accounts for around 37.2% of the variance in intended effort, but only about 4.1% in achievement.These results may help explain the conflicting findings in the literature: Studies relying on intended effort found strong support for the predictive validity of the ideal L2 self, while those drawing from other objective measures were less supportive.
Recently, Plonsky and Oswald (2014) have offered recommendations for field-specific benchmarks for interpreting the size of correlation coefficients: .25 small, .40medium, and .60 large.If we follow these recommendations, the ideal L2 self is a strong predictor of intended effort, but approaching the small threshold in achievement.The relationship between the ideal L2 self and achievement is also smaller than the expected correlation between attitudes and behavior in social psychology (r = .38, Kraus, 1995).It is also smaller than the magnitude that aptitude (r = .49,Li, 2016) and intelligence (r = .54,Roth et al., 2015) explain in academic achievement, two established individual difference variables.
Given this modest magnitude, readers may wonder about the extent to which motivation contributes to language learning relative to the two classical individual difference variables, intelligence and aptitude.Nevertheless, there seem to be a number of means to improve the predictive validity of the ideal L2 self when it comes to actual language achievement.First of all, the original conceptualization of the L2MSS comes with a set of conditions that, if not satisfied, self-guides are not expected to exhibit full power (Dörnyei, 2009).These conditions include the availability of the different self-guides, their vividness, plausibility, harmony, and activation, as well as having procedural strategies and being offset by a feared self.Although these conditions were proposed together with the inception of the theory itself, they have remained largely untested and hardly any attempts have been made to incorporate them into how self-guides are currently measured (Hessel, 2015).
Another potential direction is the incorporation of discrepancy.By definition, self-guides are not absolute constructs but relational to a future state.The hypothesized effect of the ideal L2 self, for example, resides in the discrepancy between a current state and a desired future state, not the future state per se.Unfortunately, this discrepancy is not currently featured in how self-guides are measured (Thorsen, Henry, & Cliffordson, 2017).The standard scale items used to measure the ideal L2 self are in the form of 'I can imagine myself…', which is admittedly ambiguous.As an illustration, if a learner cannot imagine herself mastering English someday, this could additionally mean that she does not believe she can do that (self-efficacy), that she does not want to do that (value of the activity), that she experiences a complete absence of motivation (amotivation), that she does not need to do that (e.g., she has already mastered English), or any other interpretations different learners might conjure up.Due to this ambiguity, it might be appropriate to relabel the standard ideal L2 self scale to the imagined self, and reserve the ideal L2 self label to an improved measure that can accommodate a current-future discrepancy that the L2MSS requires by definition.
A measure that can accommodate a current-future discrepancy does not have to be a close-ended questionnaire scale.In fact, self-discrepancy is not conceptualized as a conscious construct that the individual can readily self-report (Higgins, 1987).For this reason, Higgins (1987) criticized a study by Hoge and McCarthy (1983) for using experimenter-selected attributes and asking their participants about their discrepancies directly, describing this type of measure as nonideographic.An ideographic measure, in contrast, requires that the participant is the one who supplies attributes related to, say, their actual self and their ideal self separately.It is then the researcher's job to code these attributes in order to determine 'matches' and 'mismatches' between actual and ideal selves.The results may show that one participant has a large number of matches (i.e., little discrepancy), another with mostly mismatches (much discrepancy), and yet another with neither matches or mismatches (no relevance of discrepancy).This approach has not been utilized in the language motivation field to date.Another approach that does not rely on close-ended questionnaires draws from reaction-time measures (e.g., Higgins, Shah, & Friedman, 1997;Shah & Higgins, 2001).The premise behind this approach is that higher accessibility leads to more efficient approach and avoidance tendencies unconsciously.Our field is yet to exploit the full potential of reaction-time measures to study unconscious aspects of motivation (Al-Hoorie, 2016a, 2016b, in press).
In terms of the ought-to L2 self, its predictive validity was markedly lower than that of the ideal L2 self in relation to both intended effort and achievement.As explained above, the wanting nature of the ought-to L2 self has already been pointed out by a number of scholars who recommended improvements.However, instead of leaving this construct behind in favor of newer constructs, it would be useful to attempt to understand why its theoretically anticipated effect has not been borne out.
One possible explanation is that the ought-to L2 self is -by definition (see Dörnyei, 2009, p. 29) -concerned only with the less internalized forms of motives.It pertains to someone else's expectations, rather than one's own ideals, and primarily functions in a preventive fashion.That is, since ought self-guides represent "minimal goals" (Higgins, 1998, p. 5) that are "imposed" (Dörnyei, 2009, p. 32) by one's peers, parents and authoritative figures, then learners may simply aim to achieve the minimum required to satisfy another person's desires, rather than fulfilling them more thoroughly as one might do with one's own ideals.Such minimal goals are less likely to sustain engagement in learning and enthusiasm about it in the long run.A similar picture emerges from possible selves theory.Markus and Nurius (1986) actually downplay the role of others in the formation of one's own possible selves.In their words, "others' perceptions of an individual are unlikely to reflect or to take into account possible selves" (p.964).Markus and Nurius then point out that, "when we perceive another person, or another perceives us, this aspect of perception, under most conditions, is simply not evident and typically there is little concern with it" (p.964).A similar picture emerges, again, from self-determination theory (Deci & Ryan, 2002), in which the less internalized forms of extrinsic motivation seem to be associated negatively with L2 achievement, but the more internalized forms are associated positively with it (e.g., Wang, 2008).Indeed, Mackay (2014, p. 394) reports that some of her interviewees construed external pressures to learn the language as a demotivating factor.All of this points to the need to reconsider the original conceptualization of the ought-to L2 self construct as a motivational factor, an assumption held in the field for more than a decade.It might be more appropriately conceived of, at least in some contexts, as a demotivating variable instead.
Another possible explanation is that current measurement practice does not distinguish between own-other standpoints in self-guides (Lanvers, 2016;Teimouri, 2017;Thompson & Vásquez, 2015).However, before treading this path, a number of conceptual issues need to be addressed.First, introducing standpoints may make the different self-guides less clear-cut.That is, where do we draw the line between an ideal-own and ought-own, and between ideal-other and ought-other (see Dörnyei, 2009, pp. 13-14, for a similar argument).Second, as Dörnyei and Ushioda (2009a, p. 352) point out, degrees of internalization are inherent to self-determination theory.When degrees of internalization are used to justify the different self-guides (e.g., ideal-own versus ideal-other), theorists need to consider in what respects this new formulation is more than self-determination theory cast in self terminology.This is a crucial consideration since it is desirable to avoid a situation where different researchers within one field deal with more or less the same phenomena but independently due to different terminology (Dörnyei & Ryan, 2015).
A further consideration pertains to the proliferation of 'selves' witnessed in the field today.Some scholars (MacIntyre & Mackinnon, 2007;MacIntyre, Mackinnon, & Clément, 2009) argue that these selves are hardly more than mere metaphors, risking unnecessary redundancy and conceptual clutter.For example, MacIntyre and Mackinnon (2007) list over 60 self-related constructs in psychology, leading them to argue that "the multitude of overlapping concepts in the literature on the self is more confusing than integrativeness ever could be" (MacIntyre et al., 2009, p. 54).Just like psychology, the language motivation field is witnessing more and more selves being introduced, including anti-oughtto, rebellious, imposed, bilingual, multilingual, private, public, possible, and probable selves, but without sufficient attention to their construct validity or their overlap.In fact, it has become fashionable to introduce a new construct and suffix it with a 'self' even when existing constructs seem to exist (e.g., antiought-to self versus reactance, and feared L2 self versus fear of failure).Adding a new dimension to an existing construct (e.g., L2 reactance) may be more appropriate than introducing yet another 'self'.As Albert Bandura cautions, a theory cast in terms of multiple selves plunges one into deep philosophical waters.It requires a regress of selves to a presiding superordinate self that selects and manages the collection of possible selves to suit given purposes.Actually, there is only one self that can visualize different desired and undesired futures and select courses of action designed to attain cherished futures and escape feared ones.(Bandura, 1997, p. 26)

The L2 learning experience
As reviewed above, the L2 learning experience has been described as the strongest predictor of intended effort.However, the results of the present meta-analysis suggest that the high correlation between the L2 learning experience and intended effort may partly be an artifact of not implementing a factor-analytic procedure.A cursory look at the literature suggests that the importance of examining the factorial structure of scales is not appreciated.Researchers, reviewers, and editors seem satisfied with a quick Cronbach analysis showing a reliability of around .70.However, reliability assumes that the scale is already unidimensional, and when it is not, reliability can be artificially inflated (see Al-Hoorie & Vitta, in press;Green, Lissitz, & Mulaik, 1977;Sijtsma, 2009).Based on the present results, it is recommended that researchers routinely present the results of a factor analytic procedure (whether from classical test theory or item response theory) to establish convergent and discriminant validity among their scales, along with their reliabilities.
In contrast to its correlation with intended effort, the L2 learning experience had a modest correlation with achievement (r = .17).This suggests that, to date, the small number of studies that have examined the correlation between this variable and achievement do not support a strong association.Furthermore, little theoretical analysis is available to explain why this association should be causal in the first place (Ushioda, 2011), especially since virtually all studies included in the present meta-analysis were observational.Neither is this modest association totally inconsistent with experimental studies (on non-L2 learning) that do not support a causal relationship between student evaluation of the course and educational outcomes (Arbuckle & Williams, 2003;Boring, 2015;Braga et al., 2014;Carrell & West, 2010;MacNell et al., 2015).Having a positive attitude toward the course and its teacher may not necessarily imply better learning, even if the learner believes so.Indeed, it is not an unusual experience for a learner to get the 'impression' that they have mastered the subject, but to subsequently realize from a test that there were significant gaps in their knowledge.This misleading impression of mastery may be attributed to different reasons, including a teacher with a charismatic personality or simply an entertaining approach (see Al-Hoorie, 2017a, for a more detailed review).
Evidence of this misleading impression has been demonstrated graphically in a classic experiment titled 'The Doctor Fox Lecture: A paradigm of educational seduction' (Naftulin, Ware, & Donnelly, 1973).These researchers recruited a professional actor to give a lecture about game theory (a subject he knew nothing about).The actor was given a fake name, Dr. Myron L. Fox, and was introduced to the unsuspecting audience as an expert in the application of mathematics to human behavior.Drawing from his acting skills, the actor peppered his lecture with some humor as well as meaningless, conflicting, and irrelevant information.At the same time, he sounded authoritative and exhibited a charismatic personality.Despite the empty content of the lecture, the audience reported having enjoyed the lecture and even learned from it (in fact, one person even reported that s/he had read some of the speaker's publications!).We can confidently argue that, despite this favorable impression, no learning or any knowledge transmission about game theory occurred during that lecture.'Dr.Fox' simply did not know the material.The feeling of having learned from the lecture is little more than a misattribution.The audience simply enjoyed the charismatic and authoritative personality of lecturer, but then misattributed this enjoyment to the informativeness of the lecture.Naftulin et al. (1973) conclude that "student satisfaction with learning may represent little more than the illusion of having learned" (p.630).This is now known as the Dr. Fox effect. 2hese results point to the urgent need for experimental studies in the language motivation field for testing causal assumptions.One reason for the paucity of experiments has to do with the numerous practical and logistic considerations involved (see, e.g., Csizér & Magid, 2014, Part 4).Still, language motivation researchers could take their cues from other SLA areas where experiments are very common.When it comes to instructional effects, for example, Plonsky (2013) reports that experimental studies are about twice more common in classroom research than are observational studies.Experimental studies are also needed to examine pedagogical implications derived from observational studies (e.g., Dörnyei & Kubanyiova, 2014;Hadfield & Dörnyei, 2013).It is not unimaginable that some of these recommendations might turn out premature.If some implications do turn out to be premature, this could ultimately undermine the field's credibility.

Intended effort
In the L2MSS tradition, self-reported intended effort has been frequently called the criterion measure (sometimes with capital C and M).Although any outcome can be described as a criterion measure (since it simply refers to the dependent variable), the convention of calling intended effort as the criterion measure is nowadays seen everywhere in research reports -from scale descriptions, through results tables, to structural equation models.Another euphemism is 'motivated behavior'.In reality, however, this scale typically refers to intended effort rather than observation of actual behavior.
Still, a subjective measure is not necessarily less valid.The use of a subjective measure could provide unique insights that more objective measures might not capture.Nevertheless, there are at least two important considerations to take into account with regard to this scale.First, the items in this scale tend to be generic, while generic intentions are less likely to translate into behavior (Fishbein & Ajzen, 2010).This is especially because conscious thought suffers from substantial blind spots when it comes to predicting how one will actually behave (see Al-Hoorie, 2015, 2016a, 2016b, 2017b, for a more detailed discussion).Following Wolters and Taylor (2012), intended effort could be made more specific by recognizing the different 'dimensions' of motivated behavior.In one dimension, some activities reflect behavioral engagement while others reflect academic engagement.Behavioral engagement includes class participation and other overt behavioral effort.Academic engagement, on the other hand, refers to time spent on learning tasks and amount of assignments completed.Although both constitute 'effort', the latter reflects more quality engagement.A second dimension of effort is whether it is universal and optional.Universal engagement refers to the activities that all students are expected to engage in, such as attending class and doing homework.In contrast, optional engagement refers to going beyond the expected of the typical student, by showing initiative and volunteering for relevant extracurricular activities.A third dimension is the need to consider engagement in adaptive versus maladaptive forms of behavior.A learner may engage in adaptive learning behaviors, but might at the same time also engage in other maladaptive behaviors (e.g., procrastination, defensive pessimism, and other forms of self-handicapping).Focusing on adaptive behaviors only might miss important pieces from the overall picture.A final dimension is the need to consider agentic versus non-agentic behavior.Effort expended by the learner that is planful and purposeful should count as more than the effort that merely reflects norm following."Students coerced to finish worksheets using specific tactics rigidly dictated by a teacher may appear cognitively and behaviorally engaged" (Wolters & Taylor, 2012, p. 645) but not necessarily actually motivated.Adopting such level of specificity would likely enrich our perspective on learning motivation and open up interesting directions for future research.
Second, the use of intended effort leads a conceptual difficulty.Theoretical clarity requires observing "the motivation → behavior → outcome chain" (Dörnyei, 2005, p. 73) because "If we want to draw more meaningful inferences about the impact of various motives, it is more appropriate to use some sort of a behavioural measure as the criterion/dependent variable" (Dörnyei & Ushioda, 2011, p. 200, original emphasis).Intended effort does not seem to qualify as representative of the 'behavior' piece of the chain -until it is actually performed.This is why, in his review of the L2MSS, Gardner (2010) maintains that "they relate one measure based on verbal report to another measure based on verbal report" (p.73).A theoretical justification for the use of intended effort as an outcome measure is needed to clarify what we can learn from this construct and in which contexts.

Limitations and conclusion
Because this is the first meta-analysis of the L2MSS, the present study inevitably has a number of limitations.The number of studies using a criterion measure other than intended effort is relatively small.This small number resulted in relatively wide confidence intervals, and further showed evidence of publication bias.This small sample also led to a pragmatic decision to combine all objective measures into one group.However, it is not implausible that different outcome measures would exhibit different results (e.g., end-of-year grades versus researcher-administered tests).
Therefore, the present meta-analysis must be considered a meta-analysisin-progress and be updated once a sufficient number of studies using different outcome measures become available.Although a number of studies were excluded for not reporting correlation results, most of these studies followed the general trend of relying on intended effort rather than other outcome measures.The same applies to potential moderators, including age, gender, context, and target language (see also Ellis, 2006).
The results also draw attention to the urgent need for experimental studies in the language motivation field.For historical reasons, our field has relied heavily on observational questionnaire-based research designs.At the same time, many arguments in the field have causal implications, and even pedagogical recommendations for classroom teachers.In fact, making a list of pedagogical implications has become a default expectation from researchers (and graduate students), even when their research is observational.Without experimental research to support such pedagogical recommendations, this practice may be at best misleading, and at worst damaging to the field.However, overcoming the various logistics involved in conducting experimental research -whether inside or outside the classroomwould eventually lead to a science that is more instructive to classroom practice and to language learning in general.

Figure 1
Figure 1 Funnel plot showing the relationship between the ideal L2 self and achievement

Table 1
Correlations between the three L2MSS components and the two outcome variables

Table 2
Correlations between the three L2MSS components and intended effort for studies that applied a factor-analytic procedure and studies that did not