Studies in Second Language Learning and Teaching

Research into the potential of collaborative writing is relatively new. Similarly, task repetition (TR), which has been claimed to be a valuable tool for language learning, has been rarely explored in the context of writing. Therefore, little is known about the potential of combining TR and collaborative writing, and even less if we focus on young learners (YLs), who constitute a generally under-researched population. With these research gaps in mind, the present study examines the compositions of 10 pairs of learners of English as a foreign language (EFL) (aged 12) who write the same text in response to the same picture prompt three times over a three-week period. Our analysis includes the language-related episodes (LREs) that learners generate while writing collaboratively and, also, a thorough analysis of the three drafts that students produce, including quantitative (complexity, accuracy and fluency (CAF)) and holistic measures. Results show that learners’ compositions improve with repetition when measured by holistic ratings although CAF measures fail to grasp this improvement. As for the LREs, a great amount was found, most of the episodes were focused on form, most were successfully resolved and their amount declined with TR. In light of these results we argue in favor of the inclusion of holistic measures when analyzing students’ productions and discuss the positive effects of collaborative writing in the context of TR with YLs .


Introduction
The use of pair and group work in language classrooms is anchored in firm pedagogical and theoretical bases (Storch, 2011) and has been frequently investigated in the context of oral language (Mackey & Gass, 2006), whereas the study of collaborative writing is still relatively new (Abrams & Byrd, 2017;Storch, 2011). Likewise, task repetition (TR), which has been claimed to offer students great learning opportunities by allowing them to shift their focus from content to form (Bygate & Samuda, 2005), has been rarely explored in the context of writing (Amiryousefi, 2016), where findings from research on oral data can hardly be applicable due to the important differences between the oral and written mode (Gilabert, Manchón, & Vasylets, 2016;Manchón, 2014;Tavakoli, 2014). As TR and pair work have been explored independently, little is known about the potential of combining them, and even less about their potential in the case of writing tasks. Finally, most of the existing literature has focused on adult learners, disregarding a population that is increasing all over the world: young language learners (Collins & Muñoz, 2016;Copland, Garton, & Burns, 2014;Enever, 2018;Pinter, 2017).
In order to shed some light into this research gap, this study examines the compositions of 10 pairs of 12 year-old learners of English who had to write the same narrative in response to a picture prompt three times over a three week period in a classroom context. Our analysis includes the widely used measurement of the main components of linguistic performance (complexity, accuracy and fluency (CAF)) (Housen & Kuiken, 2009;Housen, Kuiken, & Vedder, 2012;Michel, 2017), a holistic assessment of their writings (Storch, 2005), and the analysis of the students' deliberations during the writing process operationalized as language related episodes (LREs) (Swain & Lapkin, 1998). Our findings will help to better understand the potential of task repetition in the context of writing with young learners (YLs).

Collaborative writing
Collaborative writing has been defined as "the production of a text by two or more writers" (Storch, 2016, p. 387). While writing together, the authors are expected to interact, combine their ideas, and co-author and co-own the text, as well as their responsibilities as writers (Ede & Lunsford, 1990;Storch, 2013Storch, , 2018. Ideally, in a text written collaboratively, the parts created by each of its authors cannot be identified. Collaborative writing combines the benefits of oral interaction and writing tasks. During interaction, learners engage in meaningful use of the target language (TL), have opportunities to negotiate for meaning, and produce modified output. In addition, they receive peer support as well as immediate feedback, and are able to co-construct new meaning (Loewen & Sato, 2018;Long, 1983;Storch, 2013;Swain, 2005). The benefits of writing are many. One of the advantages is the extra time learners have to pay attention to meaning and form, which is not as available during oral-only tasks (Manchón, 2014;Storch, 2016). Given the lack of spontaneity and immediacy of writing, as well as the access writers have to their production, anxiety might also be lower than in oral communication (Tavakoli, 2014). Moreover, writing has been claimed to encourage the use of language structures that are not normally employed orally (Williams, 2012). Finally, the written modality demands higher levels of accuracy, as errors tend to be less tolerated (Schoonen, Snellings, Stevenson, & van Gelderen, 2009).
Studies that have compared writing tasks carried out in pairs with tasks completed individually have reported gains in accuracy regarding target words and structures when learners collaborate (Nassaji & Tian, 2010;Storch, 2007;Teng, 2020). In addition, learners writing collaboratively have been reported to produce shorter but better texts in terms of grammatical accuracy, complexity and task fulfilment (Storch, 2005). Learners writing collaboratively have also been found to initiate and solve more LREs than they do when performing oralonly tasks (Adams & Ross-Feldman, 2008;García Mayo & Azkarai, 2016). Also, when asked, learners have expressed positive views towards writing in collaboration with their peers (Storch, 2005).
However, to date, these findings are based on research into adult collaborative writing. Little is known about children writing collaboratively and whether the claims summarized above hold for this specific learner group (Coyle & Roca de Larios, 2014).

Task repetition
The repetition of communicative situations occurs in everyday life. We often need to perform the same tasks and chores more than once in our life. We have to go to the shops, to the bank, or just interact with our neighbors in the lift. TR constitutes, therefore, a common human activity. Bygate (2018) recently defined the construct of TR as "the repetition of a given configuration of purposes, and a set of content information" (p. 2). This definition underlines the idea that nothing can be exactly repeated and that, consequently, changes may happen from one performance to the next. These constitute, in fact, the key elements of TR: how learners' performances vary from one iteration to another, and how these changes relate to language acquisition (Bygate, 2018). TR influences the way learners perform a task, and the language they use to deal with it. TR has been found to help learners to produce improved output (Bygate, 1996(Bygate, , 2001Lambert, Kormos, & Minn, 2017;Sample & Michel, 2014). By repeating a task, learners' attention is diverted from conceptualizing the meaning they want to convey during the first iteration, to the formulation of their message in subsequent encounters with the task (Bui, Ahmadian, & Hunter, 2018;Bygate, 1996). Most research has addressed the effect of TR on adult learners' oral performance, and scarce attention has been paid to the potential of TR for YLs (Pinter, 2006(Pinter, , 2007(Pinter, , 2011. In any case, research to date both on adult and child populations concurs that gains have always been found with TR although there are differences regarding the aspects that show greater improvements. In general, fluency gains have been reported, whereas the evidence regarding complexity and accuracy is more variable (Ahmadian & Tavakoli, 2011;Bagheri, Rahimi, & Riasati, 2012;Bret Blasco, 2014;Bygate, 2001;Bygate & Samuda, 2005;García Mayo, Imaz Agirre, & Azkarai, 2017;Hidalgo, 2018;Hu, 2018;Lynch & Maclean, 2000;Pinter, 2006Pinter, , 2007Pinter, , 2011Sample & Michel, 2014). The fact that findings regarding some aspects are inconclusive (Bui et al., 2018;Bygate, 2018) may be partly due to the great diversity of variables analyzed (context, age, level, tasks, and time span between repetitions) (Lázaro-Ibarrola & Hidalgo, 2017).
It is also important to make a distinction between same TR, the most widely explored type, in which learners repeat the exact same task, and task-type repetition (procedural repetition), in which students repeat the same task type but with different content (Kim, 2013;Kim & Tracy-Ventura, 2013;Payant & Reagan, 2018). With oral data from junior high school Korean students, Kim (2013), and Kim and Tracy-Ventura (2013) compared these two types of repetition and found that learners' interest and focus on form (measured by their use of LREs) decreased when repeating the same task in comparison to learners who repeated different versions of the same task type. However, they do not recommend any method over the other, since no significant differences were found between the groups. Payant and Reagan's (2018) study also showed that LREs decreased with exact TR and that learners focused mainly on the meaning of the message they want to convey, producing more meaning-focused LREs. On the other hand, these authors suggest that exact TR had greater benefits as regards the production of LREs. Finally, they reported that most LREs were correctly solved.
Despite the body of work addressing TR in relation to different aspects of language performance, only a few studies have analyzed the effect of TR on writing (Amiryousefi, 2016;Manchón, 2014;Nitta & Baba, 2014). One of the few studies addressing TR and written performance is Amiryousefi's (2016). This author analyzed the effects of exact TR and procedural TR on low-proficiency EFL learners' (mean age 23.56) computer-mediated individual written production. His results provided positive evidence of the benefits of both TR types for writing, although some differences were found. The compositions by the exact TR group improved significantly in terms of fluency (measured as numbers of words, clauses and T-units) and in one of his accuracy measures (the percentage of error-free clauses), whereas the procedural TR group only improved in two of the fluency measures (namely number of words and clauses per text). Nitta and Baba (2014) explored the effect of these two types of TR on writing over time. In their longitudinal study, they found that procedural TR had a marked effect on lexical and grammatical aspects, whereas the influence of exact TR was limited. Nevertheless, they suggest that the benefits of TR may be more noticeable in the long term.
In a very similar context to that of our study, Hidalgo and García Mayo (2019) examined the effect of TR on the production of LREs by YLs while performing a collaborative writing task. Contrary to most research to date, their participants initiated more form-focused than meaning-focused LREs. On the other hand, they also reported that most LREs were correctly solved and that LREs decreased significantly with exact TR.

Research questions
The present study analyzes the effects of exact TR on the collaborative writing of 10 pairs of EFL learners. Our first aim is to find out if learners are able to generate better texts (measured quantitatively and holistically) with TR. Also, we want to understand how TR affects the LREs that learners generate while writing, that is, whether it affects the amount, the type or their ability to successfully resolve them. Therefore, our research questions are the following: 1. How do learners' drafts change (quantitatively and qualitatively) with TR? 2. How does TR affect the number, nature and resolution of learners' LREs?
On the basis of the literature review, our learners' drafts will be expected to improve with repetition, however, it is not clear what specific components might improve more. On the other hand, LREs will be expected to decrease with the repetitions, and will probably be mainly form-focused and correctly solved (Hidalgo & García Mayo, 2019).

Participants and setting
The participants in the present study were 20 EFL learners (mean age 11.39) who attended a Content and Language Integrated Learning (CLIL) program at a state school in the north of Spain. At the moment of data collection their command of the TL was described as an A2 level of the Common European Framework of References for Languages (CEFR), as attested by the Cambridge Key English Test (KET) and by school-internal tests.
In the school, the learners followed a CLIL program and their exposure to the TL was approximately 14 hours per week. English language as such was allotted five sessions per week, and the rest of hours of exposure included other subject matters taught through English, such as math, science, art and physical education. This CLIL program was mandatory for all pupils. This eliminates the risk that only the most motivated learners, or those with a higher-than-average command of the TL, would participate in the study.

Procedure
The participants had to work in pairs to write a narrative in response to a picture prompt three times over a three-week period in a classroom context. The pairs were established by the researchers and learners' own teacher, taking into account their personal relationship (to avoid conflict) and, at the same time, trying to make pairs of very similar levels of proficiency. The prompt consisted of a sixpicture comic strip (Cambridge English, 2014, p. 3). The dyads sat together and were given two minutes to look at the pictures and speak about them. After the two minutes, they were asked to collaborate to compose the story in writing, with a pen, on a piece of paper. Each dyad had to produce a single composition at each data collection time. The time limit set for students to perform the task was fifteen minutes. The dyads remained the same throughout the experiment.
The participants' deliberations were video and audio recorded and their oral production (30 transcripts, 8 hours approximately) was transcribed into the CHAT (Codes for the Human Analysis of Transcripts) format. Their attention to form, operationalized as LREs (Swain & Lapkin, 1998), was coded using the CLAN (Computerized Language Analysis) tools (MacWhinney, 2000).

Coding and analysis
Our analysis of the learners' written compositions consisted of both quantitative and holistic measures. In both cases we compared the production at Time 1 (henceforth T1) versus the production at Time 2 (T2) and the production at T1 and T2 vs. the production at Time 3 (T3).
L2 performance has been defined as multicomponential in nature and its principal components have been successfully captured in the notions of complexity, fluency and accuracy (CAF) (Housen & Kuiken, 2009;Housen et al., 2012;Michel, 2017). Although there is some controversy regarding how these constructs are operationalized (particularly with fluency and complexity), the three components still are the most reliable tool to measure proficiency (Housen et al., 2012).
Our choice of the specific CAF measures was based on the main measurements used in some previous studies that seemed to be applicable to our data. Thus, complexity was measured in terms of the proportion of dependent clauses and clauses to T-units (Foster, Tonkyn, & Wigglesworth, 2000). T-units are defined as "one main clause plus whatever subordinate clauses happen to be attached to or embedded within it" (Hunt, 1966, p. 735). Also, our measurement of complexity included lexical diversity, which was measured in terms of the type/token ratio (TTR), that is, the number of different words in a text divided by the total number of words (Malvern, Richards, Chipere, & Durán, 2004). For the analysis of accuracy, the least controversial of the three constructs, the percentage of the error-free clauses over the total number of clauses and the number of errors per total number of words were considered (Storch, 2005;Storch & Wigglesworth, 2007;Wigglesworth & Storch, 2009). Finally, fluency was measured in terms of the number of words, clauses and T-units per text (Wolfe-Quintero, Inagaki, & Kim, 1998).
In addition to this, we also took into account the functional dimension of our students' production by carrying out a holistic assessment of their writings (De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012;Kuiken, Vedder, & Gilabert, 2010;Pallotti, 2009). While there is no agreement to date as to how functional adequacy is to be defined or assessed (Iwashita, Brown, McNamara, & O'Hagan, 2008), its inclusion is vital in order to obtain a more comprehensive assessment of students' production. In this paper, functional adequacy is measured using Storch's (2005) 5-scale global evaluation scheme, which we adapted to the content of the task we employed. This evaluation considered the content and structure of the text, as well as the degree of task fulfillment (the appendix).
Finally, our study also analyzed the LREs generated in the students' oral interactions during the process of writing their texts. Following previous research in EFL settings (García Mayo & Azkarai, 2016;Hidalgo & García Mayo, 2019;López-Serrano, Roca de Larios, & Manchón, 2019), the LREs were classified according to their linguistic focus, whether they were meaning-focused or form-focused (deliberation over morpho-syntactic aspects, spelling and pronunciation), and to their outcome (resolved or not resolved). Finally, resolved LREs were further classified as target-like, or non-target-like. The codification of LREs is illustrated with examples (Examples 1, 2, 3 and 4) from our own dataset.
(1) Form-focused and target-like resolution. In Example 1, the learners focus on the tense of the verb they want to use. Student 1 starts narrating the story and Student 2 interrupts her to suggest that they should use a different tense, namely, the past tense. Student 1 agrees with her partner and they settle on the past tense. Thus, this LRE has been coded as form-focused with a target-like resolution.
(2) Meaning-focused, form-focused, and target-like resolution. In Example 2, Student 2 is not satisfied with the term employed by Student 1 and proposes a more target-like word (happy). Additionally, she focuses on the spelling of the word (with double p). The LREs in this example have been coded as one meaning-focused and one form-focused LRE, both target-like resolved.
(3) Form-focused and non-target-like resolution. Example 3 represents an instance in which the learners were not able to successfully solve a form-focused LRE. Student 2 seems to think the verb to put has a regular past form, so he adds the -ed ending. He provides evidence for his decision by going back to the beginning of their composition and emphasizing the past tenses they had used. His partner agrees, and they use a wrong form (puted*) in their text. This LRE has been coded as form-focused with a non-target-like resolution.
(4) Form-focused and not resolved. Example 4 represents an occasion in which the participants do not solve a meaning-focused LRE. Apparently, none of the learners is able to provide the term they want to use (unintentionally), and they decide to write something different.

Inter-rater reliability
The participants' written production was coded by one of the authors of this paper. An independent research assistant also analyzed the production of 5 pairs at the three testing times (50% of the data). Both raters held several meetings prior to data coding to agree on their understanding of the measures of analysis and also after their coding in order to solve the few discrepancies on a case-bycase basis. Inter-rater reliability was checked for all measures and the differences between the two raters were very small. Total agreement was reached by the two researchers for the codification of the LREs. Regarding CAF, total agreement was found for complexity and fluency while the greatest number of discrepancies was found in the case of accuracy (93.5% agreement). The holistic ratings for the three compositions reached a global agreement of 92%.

Statistical analysis
As for the statistical analysis, dependent samples t-tests were used for data that presented a normal distribution and Wilcoxon Signed-Rank Tests were used for the data that were not normally distributed. The significance level was set at α = .05.

Results
The results obtained from the analyses of CAF reveal that TR does not seem to have a great influence on any of these three dimensions in the compositions written by the young participants in the present study. Table 1 shows the results for our complexity measures. As illustrated in Table 1, both the proportion of clauses to T-units and the percentage of dependent clauses appear to follow a decreasing tendency. Lexical diversity, on the other hand, seems to increase in the second repetition and decrease again in the third one. Nevertheless, the differences across tasks did not reach statistical significance for any of the different complexity measures. Table 2 features the results from the analysis of the accuracy measures. As Table 2 shows, there seems to be a slight increase in the percentage of error-free clauses from the first to the second and third compositions, which might hint at an improvement in terms of accuracy. However, as in the case of complexity, there are no statistically significant changes. As for fluency, Table 3 shows the results. The mean number of words, clauses and T-units per composition show a trend to increase with TR. Nevertheless, only the difference in the proportion of T-units between T1 and T2 is significantly different. All the other aspects did not show statistically significant differences.
The holistic analysis, on the other hand, revealed more encouraging results, as can be seen in Table 4. As can be seen, the mean rate obtained in the three drafts improves with TR. The scores of all participants ranged from 2 to 4.5 and all dyads' last composition was the best rated. A statistical analysis shows that the improvement from task to task of the global evaluation of the texts was statistically significant (T1 vs. T2: Z = -2.56, p = .010; T2 vs. T3: Z = -2.07, p = .038; T1 vs. T3: Z = -2.97, p = .003). Finally, the analysis of the LREs identified in the pair dialogues shows that interaction related to language was recurrent in all dyads' oral production while writing their texts. Table 5 shows the amount of LREs produced by the ten pairs in each composition. We can see that the discussions of language aspects, operationalized as LREs, seems to decrease with each TR. In fact, a statistical analysis shows that this difference is statistically significant when comparing the first composition to the last one (T1 vs. T3: Z = -2.60, p = .009). Table 5 LREs produced by the ten pairs Next, LREs were classified as either meaning-or form-focused. Table 6 summarizes the distribution of the LREs in terms of the total turns for each LRE type. Table 6 LRE types Table 6 clearly shows that form-focused LREs made up the greatest proportion of the total LREs at the three data collection times although there was also a large number of meaning-focused LREs. The difference between the frequency of these two types was statistically significant in the three tasks (T1: Z = -2.20, p = .028; T2: Z = -2.24, p = .025; T3: Z = -2.49, p = .012). As for the effect of TR on the nature of the LREs, the frequency of the percentage of use of meaningfocused LREs decreases significantly from the first task performance to the last  Finally, we addressed the impact of TR on the outcome of the LREs. The results are presented in Table 7. The most relevant finding is that most LREs were target-like resolved. On the other hand, the percentage of the correctly solved LREs appears to follow a decreasing trend, however, this decrease did not reach statistical significance (T1 vs. T2: Z = -1.12, p = .26; T2 vs. T3: Z = -0.35, p = .72; T1 vs. T3: Z = -1.36, p = .17).

Discussion
The present study has examined the effect of TR on the collaboratively written texts of ten pairs of young EFL learners. More specifically, the two students in each pair worked together while writing the exact same composition three times over a three-week period. Our analysis included the quantitative and holistic analysis of these three compositions as well as the analysis of the quantity, type and resolution of the LREs generated by the learners while writing.
Our first research question addressed the effect of TR on these YLs' written compositions in terms of CAF and holistic ratings. Regarding CAF measures, our results reveal mainly non-significant differences, with only an increase in lexical diversity and in the proportion of T-units at T2. However, the raw numbers seem to suggest a tendency towards a greater number of error-free units, greater lexical diversity and greater fluency in either the second or the third composition. As Storch (2005) suggests, the lack of statistical significance may have to do with the small sample size analyzed in the present study (10 dyads, 20 learners), and the relatively short texts these YLs wrote (113.4 words on average). On the other hand, the holistic ratings help us to complete these results. Each time the learners performed the task the mean score improved significantly. This positive finding is in line with the trends hinted at in the analyses of the CAF measures, which, even though when examined separately did not reach statistical significance, seem to be strong enough to give a better global impression.
Thus, our findings support previous research from the oral domain addressing YLs that report better overall performance across TR (Pinter, 2007;Sample & Michel, 2014) and also suggest that CAF measures are not always able to fully grasp the improvements that students make in their writings.
Our second research question focused on the impact of TR on the quantity, nature and outcome of the LREs YLs initiate while composing a text collaboratively. In our students' production, the overall number of LREs has decreased significantly over time. These learners have worked three times with the exact same content and task procedure, and by the last task performance they are so familiar with both that they may not need to resort to metalinguistic discussions so much. Also, by the third TR, YLs may have already solved most of their doubts and language problems from the first iterations, are able to carry this knowledge to the next performance (Hidalgo & García Mayo, 2019;Payant & Reagan, 2018) and, in line with Sample and Michel's (2014) study with oral data, might also be more able to focus their attention on all three CAF dimensions simultaneously and, therefore, to improve their drafts.
In addition to the above, most of the LREs identified in our data were correctly solved at the three data collection times, also mirroring previous findings (García Mayo & Azkarai, 2016;Hidalgo & García Mayo, 2019;Payant & Reagan, 2018). This evidence underlines the benefits of collaborative writing tasks, which offer learners the opportunity to pool their knowledge together and solve language problems correctly. However, as opposed to most previous research, yet concurring with Hidalgo and García Mayo (2019), the majority of the LREs produced by the YLs in the present study were categorized as form-focused. This seemingly contradictory finding may be related to different factors. First, most previous research studies have addressed adult learners whereas, like Hidalgo and García Mayo (2019), we have worked with primary school learners. Besides, most studies employed oral tasks, whereas we have examined learners' oral interactions while producing a written text. Finally, different categorizations have been employed, which, for instance, consider pronunciation-and spelling-related LREs as lexical-based (Payant & Reagan, 2018). In the current study, on the other hand, we have followed García Mayo and Azkarai (2016), who include the discussion of these aspects in the formfocused category. Our findings regarding the nature of the LREs are more in line with the evidence reported by these authors, who also found that their participants initiated significantly more form-focused LREs when carrying out a written task.

Conclusion
This study has provided some evidence in favor of the use of collaborative writing and TR with YLs. The repetition of the same composition three times has helped learners to generate better compositions and to discuss and successfully resolve a great number of LREs, mainly regarding formal aspects, but also with an important number of episodes focused on meaning. Our results also highlight the importance of including functional adequacy among the analytic measures of CAF (Housen et al., 2012). As we reported, CAF measures seemed to show trends of improvement but these did not reach statistical significance. On the contrary, the global assessment revealed that the compositions did, in fact, statistically improve in terms of content, structure and task fulfilment. Therefore, we advocate for the combination of quantitative and holistic measures to obtain a more thorough analysis of students' productions.
From a pedagogical perspective, our study can also offer important implications for teachers of young language learners. Even though the value of pair and group work is well recognized in second language acquisition research, and widespread in education (Storch, 2011), its use in writing lessons is still quite limited (Storch, 2005). With our study we have shed more light on the benefits of peer collaboration during the writing process. Thanks to the LREs the participants initiate, and correctly solve, they are able to successfully complete the tasks. As for the value of TR, our results illustrate how subsequent task performances of the same task lead to improved versions of the original manuscripts.
Certainly, there are some limitations to our research that need to be acknowledged and that in turn open up lines for further research. Studies with a larger sample size and that require the production of longer compositions would be necessary. In addition, research that includes more detailed analysis of the pair dialogues examining other processes learners engage in while carrying out the tasks would also help us to better understand the nature of peer-peer collaboration (López-Serrano et al., 2019). Following Wigglesworth and Storch (2009), these processes would include planning, composition, and revision, as well as the focus of these processes (e.g., task management, generation of ideas, text structure). Another interesting line of research would be the comparison of collaboratively written texts and oral narratives also produced collaboratively.
Despite the limitations to this study, we can conclude that collaborative writing and same TR seem to be beneficial for YLs. Writing together has provided them with opportunities to use the TL in a meaningful context and to share their knowledge on language use and this, combined with the repetition of the same draft, has enabled learners to produce a better final text. Finally, we would like to highlight once more the importance of the inclusion of holistic analyses of students' productions since, as we have seen, they reveal information that otherwise might remain unnoticed.