Studies in Second Language Learning and Teaching A state-of-the-art review of distribution-of-practice effects on L2 learning

The purpose of this state-of-the-art review is to provide a general overview of recent research on time distribution and second language (L2) learning with special implications for classroom settings. Several studies have been performed to examine how to best distribute the hours of L2 practice to maximize learning by comparing conditions that promote intensive exposure versus others in which L2 input or instruction is more widely spaced. Findings from these studies are relevant not only for practical purposes but also for theory development. This review provides a summary of recent studies as well as suggestions for pedagogical practice. Additionally, it identifies areas for future research concerning the effect of time distribution on L2 learning.


Introduction
According to DeKeyser (2017), one of the key issues that needs to be addressed in instructed second language acquisition (ISLA) is how to distribute the available instructional time to promote high levels of L2 proficiency. This applies to the school curriculum as well as L2 learning programs for adults in higher education. The topic of time distribution, or input spacing, has attracted the attention of cognitive psychologists for many years (see Ebbinghaus, 1885Ebbinghaus, /1913) and research on this topic has important pedagogical implications, apart from having an unquestionable theoretical value. In SLA, publications examining the effect of time distribution have drastically increased in the last few years, especially among researchers interested in the role of L2 practice in ISLA (e.g., DeKeyser, 2017;Suzuki et al., 2019).
The spacing effect, according to which learning is optimized when repetitions of target material are spaced rather than massed, is one of the most robust findings in cognitive psychology. The evidence shows that including time or other intervening items between repetitions of target items (e.g., target-distractor-distractor-distractor-target, etc.) facilitates learning more than subsequent repetitions (e.g., target-target-target). The positive effect of spaced as opposed to massed schedules has been found on a variety of tasks and for different population types, even though a lot of research has focused on verbal learning in the case of university students. A related phenomenon is that of the lag effect, which suggests that longer intervals between repetitions are more beneficial than shorter ones. The results of previous studies in cognitive psychology indicate that, while the spacing effect is ubiquitous, the lag effect is less consistent (Toppino & Gerbier, 2014).
A recent meta-analysis that focuses on the effects of spacing on L2 learning by Kim and Webb (2022) examines quantitatively the effects of spacing as reported in 37 experimental studies and further confirms the positive role of spacing in general, but points to different effects for different types of L2 areas, learners, or practice activities. Although some of the findings are inconclusive due to the low number of studies and participants in some of the analyses, the meta-analysis confirms the important role of spacing in L2 practice, as well as the need to conduct more research in the area. The present paper complements Kim and Webb's (2022) quantitative meta-analysis in presenting a qualitative narrative review of studies dealing with distribution-of-practice effects and considers not only experimental studies on the learning of a specific target feature, but also studies that have a broader aim and examine the role of spacing at the program level. The current review will provide details on 47 studies on the topic as well as a comprehensive picture of how the distribution of instructional hours has been shown to affect L2 learning.
This review is organized in the following way. Section 2 includes experimental studies and it is further subdivided into four sections: the spacing effect, the lag effect, blocked versus interleaved practice, and individual differences. The next section reviews research at the program level. Section 4 provides a summary of research findings as well as pedagogical implications. The paper concludes with ideas for further research. The Appendix contains details about each of the studies under review (marked with * in the reference list), which will be useful for the reader, as, due to space limitations, full details cannot be provided for all the studies in the main text.

Spacing effect
This section reviews the studies that have examined massed versus spaced interstimulus spacing when learning occurs in one session, as well as studies that have analyzed learning outcomes of training/teaching in one (massed) versus several sessions (spaced). The first part includes studies on L2 vocabulary, while the second one concerns grammar learning.

Vocabulary
Most research on the spacing effect comes from the cognitive psychology literature, in which, typically, psychology students acquire new vocabulary through paired-associate learning. There are also some studies aiming to contribute to the SLA literature and typically targeting L2 learners, which also use the same methodology. Many of these studies were performed with Japanese English as a foreign language (EFL) university students by Nakata and colleagues and their findings also confirm the spacing effect. Nakata (2015) compared different types of inter-stimulus spacing for the learning of English-Japanese word pairs repeated five times. The results showed that immediate repetitions (massed) promoted less learning than spaced repetitions. Nakata and Suzuki (2019a) provided further support for the spacing effect with English-Japanese translation pairs, which were included in sets of semantically related (e.g., baboon, badger, otter, etc.) and semantically unrelated words (e.g., alcove, pail, pigment, etc.). The results showed that, although massed repetitions of sets facilitated performance during training, the massed distribution led to significantly fewer vocabulary gains than the spaced one, both on the immediate and delayed posttests.
Further evidence for the spacing effect was provided by Nakata and Elgort (2021) regarding contextual word learning of pseudo-words inserted in English sentences. When the repetition of target items appeared in immediate succession, the participants' performance on the vocabulary tests was worse than when the repetitions were spaced. Interestingly, the authors did not find any differences between conditions in a semantic priming task, supposedly assessing tacit vocabulary knowledge. Koval (2019) used eye-tracking to examine English speakers' processing of Finnish words appearing in English sentences that were repeated consecutively (massed) or with 25 intervening sentences in between (spaced). The results showed significantly better vocabulary learning results for the spaced condition, for which the decrease in attention as shown by participants' eye movements was not so drastic. In a later study involving Finnish-English paired associates, Koval (2022) found additional evidence for the spacing effect. Her results also showed that massed practice was not significantly different from the no-practice control condition for long-term learning.
Finally, findings from classroom-based studies on L2 English vocabulary learning by L1 Farsi students in primary school suggest that vocabulary practice over two sessions is more conducive to vocabulary learning than one single "massed" session (Lotfolahi & Salehi, 2017).
Apart from the above-mentioned studies focusing on single words, there is research including multi-word units which also provides evidence for the spacing effect. Yamagata et al. (2022) found that spaced repetitions of verb-noun collocations led to more learning not only of the practiced collocations, but also of other collocations with the same target nodes. Similarly, Macis et al. (2021) found a significant advantage for spaced over massed practice for adjective-noun collocations when the training involved deliberate learning. However, the authors found a significant advantage for massed practice in the case of incidental learning.

Grammar
Although there is very little research comparing massed versus spaced grammar instruction, the existing evidence suggests that it is better to use distributed rather than massed practice for long-term learning. Miles (2014) compared massed and widely spaced (average spacing of 2.5 weeks) practice of challenging English grammar structures for Korean university students. The treatment included different classroom activities and the testing consisted of a grammaticality judgment test (GJT) and an L1-L2 translation task. The results showed no differences between conditions on an immediate posttest. On a delayed posttest, however, the spaced group outperformed the massed group on the GJT.

Lag effects
This section includes the studies that have compared short versus long inter-stimulus spacing in one-session experiments, as well as those analyzing inter-session spacing where learning is distributed over two or more sessions. As Rogers (2017) claims, research on inter-session lags is more relevant for SLA, as learning L2 features typically requires more than one session. The first sub-section focuses on vocabulary, while the next two review the findings concerning grammar learning and speech production.

Vocabulary
While the findings from studies examining the spacing effect in the case of vocabulary learning are quite consistent, the evidence regarding the lag effect is not so uniform. Nakata (2015) found no differences between short (5 items), medium (10 items) and long (30 items) inter-stimulus spacing, while Nakata and Webb (2016) reported that longer inter-repetition lags of 19 items were more beneficial for long-term learning of vocabulary than shorter lags of 3 items. Koval (2022) found that long spacing (71-119 trials within a block) was more beneficial than short spacing (17-38) for long-term learning of Finnish-English paired associates. As can be observed, "short" and "long" spacing were differently operationalized, which might partly explain the inconsistent results.
On the other hand, research on inter-session spacing in classroom settings has generally not provided support for the lag effect. Küpper-Tetzel et al. (2014a) compared the learning of German-English word pairs by German grade 6 learners under a massed schedule and two spaced schedules (1-day and 10-day lags). The results of a 7-day delayed test showed that the 1-day lag was more beneficial than the other two. Five weeks later, the two spaced conditions proved more advantageous than the massed condition, with no significant differences between the short and long lags. These results support previous claims that the optimal inter-session interval depends on the retention interval (Cepeda et al., 2006). Cheung (2020, 2021) examined lag effects for vocabulary learning in a primary school in Hong Kong. In the first study, the target words that were learned over a short 1-day lag were better remembered 28 days later than those learned over a longer 8-day lag. However, the second study, which was a replication of the first, found no differences between lags.
In contrast to the previous studies examining the learning of L2 words in isolation, the studies by Huang (2018, 2021) focused on contextual word learning through repeated reading in the case of secondary-school students in Taiwan. The results similarly failed to provide support for the lag effect. In both studies, the intensive condition (1-day inter-session interval) led to higher vocabulary gains on the immediate posttest than the long-spaced condition (7 days). Performance on the delayed posttest differed when learning was incidental, with no differences between conditions (Serrano & Huang, 2018), or intentional, in which case higher gains were reported for the intensive condition (Serrano & Huang, 2021).
Finally, there are some studies that have analyzed whether changing the intervals between lags during the treatment is more or less beneficial than equal or uniform spacing. Nakata (2015) found an advantage of expanding (gradually increasing interrepetition intervals) over equal spacing in learning performed in one session. Studies examining learning over multiple sessions have reported conflicting results. For example, Küpper-Tetzel et al. (2014b) found no significant differences between contracting (from 5-day to 1-day lags), equal (3-day lag) and expanding (from 1-to 5day lags) on an immediate posttest. On a test performed 7 days later, the contracting schedule was better than the equal and the expanding ones, while the opposite was found 35 days after training, with the equal and expanding schedules outperforming the contracting schedule. These results contrast with the findings from Schuetze and Weimer-Stuckmann (2011), which showed no differences between equal and expanding schedules for short-term learning but better retention in the uniform condition. In another study comparing uniform and expanding schedules for the learning of English-German word pairs, Schuetze (2015) did not find any significant differences between the two. Similarly, Snoder (2017) did not find any significant differences in the learning of verb-noun collocations between an expanding schedule (day 1, 7, and 16) and an intensive schedule (day 1, 2, 4).

Grammar
The first studies examining the effect of inter-session spacing in SLA concerned grammar learning in classroom settings and provided support for the lag effect. Bird (2010) focused on the acquisition of the simple past, present perfect and past perfect by adult EFL learners in Malaysia, over five different class sessions, spaced either over a 3-day or a 14-day interval. The results of a 7-day delayed GJT showed no differences between groups. However, the longer lag proved more helpful for long-term retention after 60 days. Rogers (2015) provided further evidence for the benefit of spacing grammar instruction over longer lags (2.25 vs. 7 days) for the incidental acquisition of challenging English grammar structures by a group of university students in the Middle East.
In contrast, the study by Kasprowicz et al. (2019), which examined the acquisition of French morphology by L1 English learners of French in grades 4-6, did not find any differences between short (3.5 days) and long (7 days) lags either on an immediate or delayed posttest.
Research by Suzuki and colleagues on productive grammar skills also failed to support the lag effect. What is more, their findings suggested that short lags might be more beneficial than longer lags. Suzuki and DeKeyser (2017a) compared 1-versus 7day inter-session intervals for short-and long-term learning of Japanese morphology by adult English speakers. Learning was assessed through accuracy and speed of performance in a rule application and a sentence completion test. The results showed no differences between lags for accuracy; however, the short-lag condition led to significantly faster performance 28 days after the instruction. In a conceptual replication and extension of that study, Suzuki (2017) provided more evidence in favor of short lags (3.3 vs. 7 days), but concerning accuracy and not speed in the production of morphology in a novel miniature language (Supurango) by L1 Japanese university students. In a follow-up study (Suzuki, 2018), it was found that the short-lag condition was also more conducive to automatization, as evidenced by participants' scores in the CV (coefficient of variation) (Segalowitz & Segalowitz, 1993).

L2 speech production
Two different aspects of students' L2 speech production have been examined regarding lag effects, one being pronunciation and the other oral fluency. Li and DeKeyser (2019) examined the acquisition of tonal word production in Chinese. The training involved the presentation of target words as well as practice that was meant to promote different types of knowledge: declarative (knowing "what," such as knowing about different tones in Mandarin) and procedural (knowing "how;" e.g., how to use the right tone in oral speech production). The authors reported that declarative knowledge decreased significantly when tested 28 days later. In addition, when the lags between training sessions were longer (7 days), this declarative knowledge was better retained than when there was only a 1-day lag. However, it was observed that for the production of new words, involving procedural knowledge, short spacing was more beneficial.
In the case of oral fluency, Bui et al. (2019) examined the effect of task repetition under different schedules for the development of L2 oral complexity, accuracy, and fluency. The same task was repeated twice either immediately (massed) or 1, 3, 7, or 15 days after the first performance. Whereas no differences were found in terms of complexity and accuracy, immediate task repetition led to significantly higher fluency than its spaced counterpart, while no other differences were found between other lags.
In a more thorough investigation of oral fluency, Suzuki and Hanzawa (2022) examined the effect of spacing six repetitions of the same task, and compared massed (immediate) short (45 minutes) and long (7 days) spacing. The authors found massed repetitions to be a "double-edged sword," because they were helpful in significantly reducing students' pauses but also led to slower articulation rate and more verbatim repetitions.

Blocking versus interleaving
Also related to time distribution, some other studies in the SLA literature have focused on whether it is more effective to learn similar forms in blocks, in which repetitions of target items or examples of target rules appear subsequently (i.e., massed), or whether interleaved practice (alternating between repetition types, i.e., spaced) is more beneficial for L2 learning. Nakata and Suzuki (2019b) examined the learning of three categories: English simple past, present perfect, and the conditionals by Japanese university students. Under the blocked condition, the activity included structures from each category consecutively. The interleaved condition alternated sentences from different categories, while in the increasing condition five sentences from each category were practiced first in blocks, while the other five were interleaved. The results of an immediate GJT showed no differences between conditions; however, the results of a delayed posttest 7 days later were significantly higher under interleaved than under blocked practice.
Suzuki and Sunada (2020) also compared these three types of schedules but, in contrast to the previous study, found the hybrid schedule (first blocked and then interleaved) to be more beneficial for the acquisition of relative pronouns. In another study also examining the learning of English relative clauses by Japanese learners but only under two schedules (blocked vs. interleaved), Suzuki et al. (2022b) showed that interleaving was more helpful for fast and accurate oral production of relative clauses on an immediate posttest, while no differences between conditions were found on a 7-day delayed posttest.
In the case of oral fluency, Suzuki (2021) found that repeating the same task three times in blocks (AAA BBB CCC) led to more fluent speech than interleaving different tasks (ABC ABC ABC). Additionally, the learners doing blocked practice were more likely to reuse the same constructions (Suzuki et al., 2022a).
Carpenter and Mueller (2013) also compared blocked and interleaved practice for the learning of eight French-pronunciation rules by L1 English speakers. The authors found that blocking (presenting example words for each rule subsequently -bateau, carreau, fardeau, etc.) was more helpful for learning pronunciation than interleaved practice, in which the presentation sequence alternated words following different rules (bateau, genou, tandis, etc.).

Lag effects and individual differences
Several studies have investigated whether certain cognitive capacities differentially affect learning under more or less concentrated schedules. Most of this research has been done by Suzuki and colleagues within the aptitude-treatment interaction framework (Robinson, 2002). Suzuki (2018) examined the role of procedural learning ability, related to the acquisition of fast and automatized knowledge, and found that it plays a clearer role when learning L2 grammar under short (3.3 days) rather than long (7 days) inter-session lags. Several studies have focused on the role of working memory (WM), which refers to a limited-capacity complex cognitive system that allows for the storage and processing of information while performing cognitive tasks (Baddeley, 2003). Different instruments have been used to measure WM; for instance, Suzuki and DeKeyser (2017b), and Suzuki (2019) used an operation span task; Suzuki (2021a) a trail-making task, while Suzuki et al. (2022a) measured WM through a listening span task. The role of WM in learning under different schedules is still unclear, although most of the evidence suggests that WM plays a more notable role in learning under concentrated schedules. Suzuki and DeKeyser (2017b) found that WM predicted learning of Japanese morphosyntax when inter-session spacing included short (1 day) but not long (7 days) lags. In a similar vein, learners' WM has been shown to affect their oral fluency development (Suzuki, 2021a), as well as their learning of relative clauses (Suzuki et al., 2022b) under blocked but not under interleaved practice schedules. In contrast, Suzuki (2019) found no effect of WM for the learning of Supurango under short (3.3 days) versus long (7 days) lags, even though the study also included an operation span task as in Suzuki and DeKeyser (2017b).
As for language-analytic or grammar-inferencing abilities, Suzuki and DeKeyser (2017b) and Suzuki (2019) reported that these skills had a clearer role when participants were learning L2 grammar in long-spaced sessions (7 days). Under this type of spaced schedule, the participants who were better able to infer grammar rules in an unknown language or memorize new form-meaning mappings (as measured by LLAMA-F and LLAMA-B, Meara, 2005) were more successful in learning the target L2 grammar.
Using the desirable difficulties framework (Bjork, 1994;Suzuki et al., 2019), Serfaty and Serrano (2022) examined how learners' individual characteristics regarding language proficiency, age, and time on task during training predicted grammar learning through digital flashcards. The authors found no overall lag effects (1-day vs. 7-day lags) when the data from all the students were analyzed together, but crucially, their analyses showed that the longer lag was more beneficial for learners of higher proficiency and shorter times on task during the learning phase, while the opposite was true for learners experiencing more difficulty during training. In other words, the longer lag was a desirable difficulty only when no additional difficulties existed on the part of the learner.

Spacing and program evaluation
The final set of studies in this review includes those focusing on the effect of time distribution at the program level, comparing programs in which the hours of instruction were differently distributed. Research in this area is scarce, with many of these studies being performed in primary schools in Canada, where a change was implemented in the 1980s to promote intensive English instruction in Quebec.
In order to extend the findings from an earlier large-scale study involving thousands of students in Quebec by Spada and Lightbown (1989) showing significant advantages in favor of learners receiving intensive English instruction, White and Turner (2005) performed an exhaustive analysis of students' oral production. This study compared the oral performance of learners receiving intensive (400 hrs in one year) versus regular (±60 h) instruction on a variety of oral tasks. Their results showed that the oral communicative abilities of the learners in the intensive program were significantly more advanced than their peers' receiving regular instruction. More recently, French et al. (2020) examined the long-term effects of intensive instruction on speech production in terms of perceived fluency, comprehensibility and accentedness. The authors found that four years after the end of their respective programs the students that had been enrolled in intensive English were perceived to be more fluent and comprehensible in this language than those who had only received regular instruction. No differences were found in accentedness, according to the raters' perceptions. The authors controlled for students' academic and language skills and, although there might be other intervening variables that were not controlled for, the results of this study provide evidence for the positive effect of intensive instruction.
Other studies were also performed in Canada in which the amount of exposure was held constant, focusing on different implementations of intensive English, referred to as massed (300-400 hrs over five months) and distributed (same hours over ten months). Collins et al. (1999) compared the learning outcomes of a group of students (N = 700) enrolled in these two programs as well as in a massed plus program, which promoted out-of-class L2 use in the school. The students performed different tests that tapped different L2 skills at the end of their respective program, which showed that the learners in the massed programs significantly outperformed those in the distributed program in most measures. However, the authors caution about attributing the difference exclusively to the distribution of instructional hours, as the students in the massed programs also ended up receiving a few more hours of instruction. Collins and White (2011) replicated these results. The authors performed a longitudinal study and assessed learners' L2 skills at four different 100-hour intervals. Although the authors suggest that the differences were not large and some of them might be due to instructional practice, several statistically significant differences were found, especially at time 3 and 4, in favor of the concentrated program.
In the Spanish context, studies by Serrano and colleagues (Serrano, 2011;Serrano & Muñoz, 2007;Serrano et al., 2015) analyzed L2 development in English courses that offered the same number of hours of instruction but distributed differently (110 hrs in 1 month vs. 3-4 vs. 7 months) in the case of adult EFL learners in a university setting. Apart from performing a general proficiency test, the participants did an oral narrative and a written essay before and after their respective course. The results showed some advantages to the more intensive program, but only at the beginner or intermediate level and for a few measures, mostly related to grammar and lexical richness and use of formulaic language in oral production.
Alcaraz-Mármol (2015) examined vocabulary learning after a 2-month intensive (6 hrs/week) and 6-month extensive (2 hrs/week) course, also in the case of adult Spanish EFL learners. The intensive program promoted more significant vocabulary gains and, although the learners also experienced more losses on a delayed posttest 10 weeks later, their performance was still significantly superior to their peers' in the extensive program.
These results contrast with the findings from Xu et al. (2012) for a group of high school learners of Mandarin in the US. In this study, although most comparisons showed no difference between a summer intensive program and a semester-long program offering the same number of hours of instruction, the learners in the latter program became more fluent.

Summary of findings and implications for L2 teaching
As can be seen from the overview presented above, the results of the studies conducted so far present some conflicting evidence for the role of spacing in different areas of L2 learning in experimental studies. The findings from these studies also contrast with those analyzing the role of intensity at the program level.
Concerning experimental studies, there is one robust finding: when learning vocabulary items from lists in one session (either including L2-L1 pairs or in sentences), it is better to space repetitions than studying them in massed sequences (e.g., Koval, 2019Koval, , 2022Nakata, 2015;Nakata & Suzuki, 2019a). Considering this finding, L2 learners should not engage in repetitive blocked/massed practice of each individual item when they are revising/learning new vocabulary from lists, but instead go through the whole list before doing repeated practice of individual words.
The results comparing vocabulary learning in one session versus several sessions show better learning outcomes under the latter schedule. This evidence suggests that teachers should encourage their learners to revise their vocabulary periodically on different days and not just one day before a test. As Nakata et al. (2021) suggest, cumulative testing might be a good way to promote vocabulary learning over different sessions, at the same time as it increases the amount of learning opportunities. However, it is not clear yet how long inter-session lags should be in spaced vocabulary practice, as some studies have found an advantage to shorter lags (Rogers & Cheung, 2020;Serrano & Huang, 2021) and others have found little difference (Rogers & Cheung, 2021). There is some indication, however, that longer lags might be more favorable when knowledge is assessed after a long period, suggesting, again, that spacing repeated exposures to novel words in the classroom is positive if longterm knowledge is the goal (Küpper-Tetzel et al., 2022a).
If we now turn to grammar learning, the results comparing massed and spaced schedules go in the same direction as for vocabulary. Interleaved or hybrid grammar practice, in which exemplars of target rules do not appear subsequently but are interspersed, promote better long-term results than blocked practice, with learning taking place in one session (Nakata & Suzuki, 2019b;Suzuki & Sunada, 2020;Suzuki et al., 2022b). One pedagogical recommendation following these findings would be for teachers to focus on contrasting different structures in one session (for instance, simple past and present perfect), rather than devoting the whole session to one single structure. Similarly, following Miles (2014), it is advisable to devote more than one session to the teaching of L2 grammar forms, which probably represents typical classroom practice in most contexts.
Concerning the lag effect in grammar learning over multiple sessions, there is conflicting evidence. On the one hand, some classroom-based studies support the lag effect for long-term learning, mostly for receptive grammar knowledge assessed through GJTs (Bird, 2010;Rogers, 2015), while, on the other hand, experimental studies examining productive skills either report no differences between lags (Serfaty & Serrano, 2022) or an advantage to shorter lags (Suzuki, 2017;Suzuki & DeKeyser, 2017a). One teaching implication would be that, if receptive declarative knowledge is the goal, it might be better to include longer lags between practice sessions, while for the proceduralization of grammar rules, shorter lags might be more beneficial.
The conflicting results obtained for lag effects for grammar might be due to the type of training and testing (receptive vs. productive skills) used in the different studies, or to learners' individual differences that were not controlled for. As some studies have shown (see section 2.4), certain types of learner profiles might benefit more from shorter or longer lags. According to Suzuki et al. (2019) and as shown in research by Serfaty and Serrano (2022), longer lags are a source of difficulty that might not be desirable when there are additional sources of difficulty on the part of the learner (e.g., low proficiency or challenges during the learning phase). It might be advisable for teachers to consider the characteristics of their learners when deciding how to space grammar practice and include longer lags in advanced groups and shorter when the group's proficiency is low. However, adapting to individual learners within a group might be challenging in classes where learners' characteristics are very diverse.
Research on L2 speech fluency suggests that massed or blocked practice could be more beneficial for the proceduralization of oral production skills (Bui et al., 2019;Suzuki, 2021b). However, when there are too many repetitions, massed practice might no longer be optimal (Suzuki & Hanzawa, 2022). Regarding pronunciation rules, the evidence suggests that more concentrated practice (blocked, if done in one session or under short lags if done over several sessions) might be more helpful for learning L2 pronunciation (Carpenter & Mueller, 2013;Li & DeKeyser, 2019). According to these findings, L2 classes should offer students the possibility of repeating oral fluency tasks or doing repeated productive or receptive practice of pronunciation rules under short-spaced schedules.
At the program level, the findings from the Canadian studies on intensive English in primary education provide clear support for intensive instruction, especially when it involves more contact hours (e.g., White & Turner, 2005) or when it is concentrated under shorter time periods. Although the differences between concentrated versus distributed intensive programs are not large, when they exist, they are in favor of the more concentrated schedule (Collins et al., 1999;Collins & White, 2011). The results of the comparison between intensive versus regular programs for adult learners are not conclusive, but, along the same lines, there appear to be more advantages for intensive L2 learning (e.g., Serrano, 2011). As Lightbown (2014) claims, drip-feed L2 instruction including very few hours per week (often 1-2), which is the typical schedule in most educational contexts, does not lead to advanced L2 skills or high communicative competence (Stern, 1985). Instead, L2 programs should offer full-flow (or intensive) exposure to the target language (see also Muñoz, 2012). Lightbown and Spada (2020) claim that it is beneficial to concentrate L2 instructional hours at the curricular level, even when there is no time increase. The authors suggest that increasing and concentrating the amount of L2 instruction when the students are more cognitively mature results in better L2 learning outcomes than an earlier start. The provision of intensive English in schools in Quebec required some restructuring of schedules for other subjects in the school curriculum, which might be challenging in many contexts. However, the promising results obtained in Canada could encourage the implementation of equivalent programs in other contexts.
It must be emphasized that the way "intensity" is conceptualized at the program level is different from the experimental studies, as it refers to intensity of total time devoted to L2 learning and not the (repetitive) practice of a specific target form (see Serrano, 2012).

Conclusion and further research
The results of the studies included in this review show that the findings on the spacing effect from cognitive psychology apply to the SLA literature for learning that takes place under similar conditions, typically rote learning of L2 vocabulary from lists. For the development of declarative knowledge (e.g., knowledge about rules), it is more beneficial to learn/practice in more than one session. It is not clear, however, whether adding more space between learning sessions is always more beneficial for L2 learning, and, in some cases, there is evidence to the contrary, as in the case of fluent (or proceduralized) L2 production, or when learning difficulties exist on the part of the learner. Further research is needed in order to know more about what L2 areas might benefit from longer spacing and for what type of learners. Recent studies on individual differences are throwing some new light on the spacing literature; however, this research is still scarce. Considering the program-evaluation literature, long spacing of small "L2 doses" is probably not recommended. It seems reasonable to assume that L2 learning, as L1 learning, should also require high doses of the target language or a full-flow approach (Lightbown, 2014;Stern, 1985). However, we need more studies that investigate how time distribution affects learning at the program level, or the development of general L2 skills both in the case of children and adults, especially considering long-term retention, which has been under-analyzed in previous research. Moreover, future research at the program level should control more the actual teaching practice, although this might be challenging considering the amount of hours of instruction that are usually involved in this type of research.
One point that needs to be mentioned is that, with the exception of the Canadian programs, most studies in this review have analyzed data from small samples and, in some cases, only around 15 learners in some conditions. These small sample sizes might be responsible for the conflicting results that are sometimes reported. Although gathering data from large samples is always a challenge in SLA research, future studies should try to obtain data from larger groups.
There are currently some replication studies (e.g., Rogers & Cheung, 2021;Serrano & Huang, 2021;Suzuki, 2017); however, more replication or close replication studies would be desirable to check whether previous findings are generalizable to other participants under equivalent methodological conditions or to different age groups. While there are some experimental studies with primary or secondary school students, most studies target adults. Additionally, areas other than vocabulary and grammar should be given priority in future studies, as most of the evidence we now have comes from research examining these two areas. More research is also necessary investigating different types of knowledge in the same study to confirm previous claims that they might be differentially affected by spacing (e.g., declarative vs. procedural; intentional vs. incidental, etc.) Finally, although it is important to have information about learning outcomes under different practice schedules, more research should be performed also examining learning processes, for instance by using eye-tracking (as in Koval, 2019) or analyzing learners' performance during the learning phase (e.g., Nakata & Suzuki, 2019a), as this research throws more light on how spacing affects L2 learning and also contributes to theoretical explanations of the spacing/lag effects.
In summary, the findings reported in this state-of-the-art review point to the need for more research on the effect of time distribution. The field needs more conclusive evidence in order to offer both practitioners and policy makers concrete and scientifically supported advice about how to organize the often limited available time for L2 learning.