Dynamic engagement in second language computer-mediated collaborative writing tasks: Does communication mode matter?

This study takes a dynamic approach to investigating engagement, examining fluctuations in cognitive-affective variables at regular time intervals during online collaborative second language (L2) writing tasks. Using online conference software and online editing software, 16 university students who use English as an L2, completed two collaborative problem-solution L2 writing tasks in two communication modes: video-chat and text-chat. After each task, learners viewed videos of their performances in 12 three-minute segments and were asked to rate their engagement on two scales (interest, focus). They were then interviewed about their attributions for fluctuations in their ratings. Group-level analysis revealed that learners experienced significantly higher focus and interest during tasks performed in video-chat mode than text-chat mode. This was contrasted with an analysis from a dynamic perspective, which produced a more nuanced picture of individual engagement trajectories during the tasks. Dynamic patterns of engagement fell into either moderately steady, increasing, decreasing, or rollercoaster pattern categories. A content analysis of 32 interviews revealed four factors that accounted for changes in engagement during tasks: task design (e.g., task familiarity), task process (e.g., instances of collaboration), task condition (e.g., communication mode), and learner factors (e.g., perceptions of proficiency).


Introduction
Collaborative writing (CW) tasks engage two or more writers with the common goal of producing a single written text (Storch, 2019). Computer-mediated CW tasks utilize online platforms (e.g., Google Doc, Wikis) that host a range of collaborative tools, potentially enhancing interaction, composition reflection, and learning in ways that are time/space independent (Li, 2018). In an effort to understand how to optimize learner involvement, researchers have been encouraged to explore the implementation of more diverse communication modes during computer-mediated CW tasks (Lee, 2010;Yim & Warschauer, 2017). So far, a handful of studies have compared the impact of oral (e.g., audio-chat) and written modes (e.g., text-chat) on the interaction/writing process (Cho, 2017;Kessler et al., 2020;Liao, 2018). However, with mixed research findings, the relative benefit of these modes, in terms of learners' interaction/observable task behavior, is unclear. An alternative approach to investigating this issue might be to examine learners' cognitive-affective engagement. Despite a surge in research into how task implementation and design can be manipulated to promote greater mental and emotional involvement (e.g., Aubrey, 2017aAubrey, , 2017bAubrey et al., 2020;Lambert et al., 2017;Phung et al., 2020;Qiu & Lo, 2017), little is known about learners' cognitive-affective responses to CW tasks, let alone how this aspect of engagement evolves over time.
To fill this research gap, the present study explored differences in learners' cognitive-affective engagement at regular intervals during computer-mediated CW tasks performed in video-chat (synchronous oral interaction via a live audio/video feed) and text-chat (synchronous written interaction via a chat function) mode. Group-level analysis compared overall engagement between the conditions, which was then contrasted with an analysis from a dynamic, individual perspective. Finally, the factors that accounted for learners' engagement dynamics were also investigated. This study responds to calls for research to foreground the ways in which engagement is dynamic and emergent at different timescales . Furthermore, the application of this approach to computer-mediated CW tasks is both novel and important as it may provide insights into how learners can be supported by technology at different stages during these tasks.

Mode of communication in computer-mediated CW tasks
Computer-mediated CW studies have predominantly employed online platforms in which learners interact with each other to plan within the task, draft, learners who performed either FTF or text-chat discussions in dyads before completing an individual, timed L2 writing task. The study addressed the methodological flaws in Liao's study, namely, the lack of clear coding examples and reliability measures. Kessler et al. (2020) also found that FTF planning resulted in more language production, while text-chat resulted in more equal interaction despite some learners finding it to be "much slower, awkward, and arduous" (p. 15). In sum, these studies indicate that oral and text-chat modes seem to prepare learners for writing in different ways, but such studies have been scarce, and findings have been mixed.
Unlike text-chat, video-chat (synchronous audio/video communication) involves actual multimodal communication as it represents both a non-verbal (visual) and verbal (audio) sensory input. This additional channel of communication could be argued to contribute to "social presence," or a psychological feeling of nearness (Yamada & Akahori, 2009). Social presence has three components: immediacy (i.e., the feeling of being physically close), intimacy (i.e., the feeling of being understood), and sociability (i.e., the feeling of connection) (Shearer & Park, 2019). These dimensions may be facilitated to different degrees via eye contact, smiling, laughing, or head nodding, which may serve to reduce the psychological distance between interlocutors (Chamberlin Quinlisk, 2008) and establish deeper emotional connections (Develotte et al., 2010). However, according to the cognitive-affective theory of learning with media (Moreno, 2005), multimodal channels, such as video-chat, carry a higher risk of creating excessive extraneous cognitive load for the learner, which can disrupt processing and induce anxiety. When multiple sources of input are received separately (i.e., spatially or temporally) in multimodal tasks, learners can waste attention integrating information from each stimulus, leading to what is called the split-attention effect (Mayer & Moreno, 1998). In L2 learning, the split-attention effect has been found to occur during audio-visual tasks where audio commentary and visual scenes are not directly related (Guichon & McLornan, 2008), and in L2 reading tasks, where connected information is presented in different locations of a text (Al-Shehri & Gitsaki, 2010). Drawing on these ideas, we might expect video-chat and text-chat to strain learners' attention in different ways. Video-chat requires learners to simultaneously integrate information received from multiple stimuli, which may place an especially high burden on cognitive resources; on the other hand, the split-attention effect may occur during textchat if written messages overlap or are delayed (i.e., non-contingent turn-taking), which would require learners to expend extra attention to consolidate messages into meaningful discourse.

Language learner engagement
Rooted in educational psychology, learner engagement refers to heightened attention, active participation, and meaningful involvement in a learning task (Mercer & Dörnyei, 2020). Furthermore, engagement is seen as a prerequisite for learning and particularly sensitive to classroom interventions (Zhou et al., 2021). Although the exact nature of engagement is debated, there seems to be a consensus that it is a multifaceted construct comprising four components: behavioral engagement (e.g., time on task), affective engagement (e.g., enjoyment), cognitive engagement (e.g., mental effort) and social engagement (e.g., interaction) (Philp & Duchesne, 2016).
Studies on computer-mediated CW tasks have provided some insight into the observable aspects of behavioral and social engagement. Research has shown that during computer-mediated CW, learners engage in collaborative behavior (Elola & Oskoz, 2010;Lee, 2010), using their collective knowledge to resolve language, content, and organizational issues through interaction (Hsu, 2019;Li, 2013). This common line of inquiry reflects Svalberg's (2009) notion that engaged learners are those "who are actively constructing their knowledge not only by mental processes but also equally by being socially active and taking initiative" (p. 246). However, learners' mental effort and emotional responses to writing tasks (i.e., the cognitive-affective dimension) are important invisible engagement factors  that are often overlooked (Mystkowska-Wiertelak, 2020), particularly in the context of learners performing CW tasks.
There is no consensus on what constitutes reliable indicators of affective engagement, though research has commonly used enthusiasm, interest, and enjoyment as positive markers (e.g., Phung et al., 2020;Skinner et al., 2009). Of these, interest has been referred to as an outcome of engagement characterized by positive emotions related to an activity (Fredricks et al., 2004;Krapp, 2003). Chen's (2001) theory of interestingness suggests that momentary feelings of interest in a task derive from novelty, challenge, attention demand, exploration intention, and instant enjoyment. Cognitive engagement, on the other hand, is associated with sustained attention and mental effort (Fredricks et al., 2004), and is thus strongly associated with the notion of focus. Having intense focus or concentration during an activity is a characteristic of flow (Csikszentmihalyi, 1990), a state that has been described as the "ultimate in engagement" (Philp & Duchesne, 2016). Though focus can relate to consciously attending to -or noticing -important linguistic features or patterns (Schmidt, 1990), focus, as an indicator of task engagement, relates more to fluency and automaticity (Egbert, 2003). Interest and focus also have a close relationship. Hidi (1990) argues that interest elicits spontaneous, automatic allocation of attention. Similarly, Csikszentmihalyi et al. (2005) describes interest as being a by-product of focused involvement in an activity. According to Aubrey's (2017b) model of flow in the task-based language classroom, key characteristics of heightened engagement in a task are interest and focus, which are mediated by certain pre-conditions, such as an appropriate balance of proficiency and task difficulty level, personal relevance of the task topic, and an environment with few distractions. In the present research, interest and focus are measured as indicators of cognitive-affective engagement.
An important characteristic of engagement is that it is dynamic (Reschly & Christenson, 2012).  argue that in future studies on engagement, "there will be clear value in . . . measures that allow the dynamics of engagement (e.g., how it is sustained and how it deteriorates) to be investigated" (p. 21). Engagement dynamics can be investigated at larger timescales (e.g., between tasks) or smaller timescales (e.g., during tasks). Aubrey et al.'s (2020) classroom-based study provides a rare example of the former. They required 37 Japanese EFL learners to provide engagement ratings (focus, desire to speak, anxiety, confidence), as well as written descriptions to account for their ratings, after their participation in weekly oral tasks over a 10-week period. Findings revealed that engagement was highly variable throughout the period, with changes shaped by learner-level factors (e.g., cognitive/physical state), lessonlevel factors (e.g., understanding of the lesson before the task), task-level factors (e.g., task design), and post-task-level factors (e.g., satisfaction of performance).
Research on the dynamics of engagement at the within-task level has adopted a dynamic systems perspective in which data are collected at regular intervals over time and analyses are carried out on individual learners (Larsen-Freeman & Cameron, 2008). These studies have primarily focused on the affective dimension of engagement. For example, using the idiodynamic method (MacIntyre, 2012) to measure moment-by-moment fluctuations in emotions, Boudreau et al. (2018) had learners perform a speaking task before asking them to view their performances and rate their enjoyment and anxiety (i.e., markers of affective engagement) on a per-second timescale. They found that enjoyment and anxiety were highly dynamic, with patterns of correlation ranging from negative to positive, suggesting that elevated enjoyment occurred during varying levels of anxiety. In a recent classroom-based study, Dao and Sato (2021) measured dynamic affective engagement, operationalized as enjoyment and interest, via a questionnaire at three five-minute intervals during a 15-minute speaking task. They found these measures to increase significantly from the first to second interval, suggesting that, despite using less nuanced measurement methods, affective engagement is still susceptible to considerable change within a task. Employing similar dynamic approaches, other studies have measured learner emotions during speaking tasks together with constructs such as task motivation and willingness to communicate (e.g., Guo et al., 2020;MacIntyre & Gregersen, 2021;MacIntyre & Serroul, 2015;Pawlak et al., 2016), each providing support for the dynamism of engagement during short speaking tasks. However, this approach has not yet been applied to CW tasks. The current study thus builds on previous research to examine the trajectories of learners' engagement (operationalized as focus and interest) during computer-mediated paired CW tasks in different modes. Specifically, the following research questions will be addressed: 1. What are the differences in engagement during computer-mediated CW tasks when learners communicate synchronously in text-chat and videochat mode? 2. What factors do learners perceive to influence engagement during computer-mediated CW tasks when learners communicate synchronously in text-chat and video-chat mode? 3. What are common patterns of fluctuation in engagement during computer-mediated CW tasks when learners communicate synchronously in text-chat and video-chat mode?

Participants
Participants included 16 learners of English (12 female, 4 male) who were attending a university in Hong Kong. Due to the COVID-19 pandemic, the university had conducted all courses online in the previous semester. Thus, all participants had some recent experiences using online synchronous communication tools. Based on information from a background questionnaire, all participants spoke Cantonese as their first language (L1) and had scored "4" on the English language subject level of their Hong Kong Diploma of Secondary School Exam (HKDSE), which is benchmarked to the IELTS score range of 6.31-6.51 (Hong Kong examinations and assessment authority [HKEAA], 2015) and equivalent to the Common European Framework of Reference (CEFR) B2/C1 level. Participants were born and raised in Hong Kong and reported having no experience living in an overseas English-speaking country. Although there are opportunities to use English on campus, all participants used Cantonese almost exclusively on a daily basis and English sometimes in the classroom. Eight participants were initially recruited by the researcher. To ensure a high degree of familiarity between interlocutors, the recruited participants were asked to find a task partner who they knew personally and who met the HKDSE "4" requirement. Informed consent was obtained from all participants before data collection began. Participants were aged between 19 and 22 and had a variety of university majors. A summary of participant information is provided in Table 1.

Tasks
Participants completed two computer-mediated, problem-solution CW tasks in pairs. Each task had a time limit of 36 minutes, which was decided based on piloting the two tasks before the study. Task 1 presented the participants with a problem related to secondary school education (see Appendix A), while Task 2 presented participants with a problem related to university education (see Appendix B). The instructions required learners to read the problem prompt, discuss possible solutions, agree on the most effective solution, and then jointly write a paragraph that summarizes the problem, their solution, and reasons. The problemsolution task was chosen because previous research suggests it is of high conceptual difficulty, which may compel learners to interact (Németh & Kormos, 2001).

Procedures
Each pair completed Task 1 followed by Task 2 in separate sessions, with a oneweek break between sessions. To control for the effect of task topic, communication mode (video-chat, text-chat) was counterbalanced between the tasks by randomly dividing pairs into two groups (see Figure 1). n = 8 n = 8 Session 1 Task 1 Task 1 Video-chat mode Text-chat mode ↓ ↓ Session 2 Task 2 Task 2 Text-chat mode Video-chat mode Figure 1 Counterbalancing the tasks The tasks were performed online using Zoom (online video/audio conferencing software) and Google Docs (online editing software). For each task, the researcher and each participant in the pair logged into the same Zoom meeting and shared the same Google Document from separate locations, which ensured any interaction was done online. Before the first task, the researcher conducted a training activity which involved demonstrating the simultaneous editing features of Google Docs as well as use of the chat function. The researcher then confirmed that all participants had some previous experience with the online tools. Next, participants wrote a short introduction of themselves to demonstrate their understanding of the software. Immediately prior to each task, the researcher stated the instructions, answered any questions, and explained the notion of CW to encourage interaction throughout the task. In video-chat mode, participants enabled their Zoom video and audio so that they could see and speak to each other while completing their joint composition (see Figure 2). In the text-chat condition, participants disabled their video and audio so they would only interact using the text-chat function in Google Docs (see Figure  3). After 36 minutes, the researcher signaled the end of the task. All task performances were audio and video (screen) recorded.

Figure 3 Screenshot of Task 2 writing (text-chat mode)
Within 24 hours of completing of each task, learners participated in a rating and stimulated-recall interview session. Each rating/interview session was done individually and took approximately 60 minutes. The procedure involved learners viewing a video of their performance in 12 three-minute segments. After each segment, the video was paused, and learners were asked to rate their focus and interest on a scale from -5 (very low) to +5 (very high). The rating procedure included a brief explanation of each variable, examples, and asking the questions: How focused were you during these three minutes? How interested were you in doing the task during this time? After the rating procedure, a line graph showing changes of each self-rated dimension was created in Microsoft Excel and shown to the participant. The researcher and the participant then looked at the graph together and discussed the trends for each variable across the task period. Examples of questions that were asked by the researcher include: Why did you rate focus/interest low at this stage of the task? Can you explain why your ratings remained stable but then increased over this time interval? Why did your ratings for focus/interest in the final minutes of the task suddenly decrease? The rating/interview procedure was an adapted version of the idiodynamic method, which was originally created to understand short-term fluctuations in cognitive-affective responses to oral tasks (MacIntyre, 2012). While previous studies using this method had learners rate each variable on a per-second timescale with the aid of computer software, the current study adopted a three-minute timescale due to the longer duration of writing tasks.

Data analysis
The data for the study consisted of 32 engagement ratings (focus, interest) and 32 transcribed interviews. Rating data were entered into SPSS version 26. Averaged ratings for focus and interest across the 12 intervals were calculated and checked for normality. As data were not normally distributed, Wilcoxon signed rank tests were performed on mean focus and interest scores to determine if there were significant differences in engagement between the video-chat and text-chat conditions. Effect sizes (r) for differences in focus and interest were estimated by dividing the z value by the square root of the sample size. A content analysis of interview transcripts was then conducted to determine the reasons for trends in engagement during the tasks. This involved an initial review of the data, coding of data and categorization of codes into themes (Cohen et al., 2007). In total, 357 separate reasons were identified and coded for positive and negative influences on engagement. Due to the intertwined nature of interest and focus (Hidi, 1990), participants were often unable to distinguish between their reasons for the two measures (e.g., P9: "interest and focus are different but kind of the same . . . I can talk about them together"). Thus, comments related to interest and focus were aggregated into one engagement category. The researcher and a research assistant used a coding scheme to independently code 20% of the data and obtained a simple intercoder agreement of 91%. Coding that resulted in disagreement was subject to further discussion until full agreement was reached. As seen in Table 2, the analysis resulted in four categories: learner factors, task design factors, task process factors, and task condition factors. Finally, to identify the patterns of fluctuations of focus and interest throughout the tasks, trajectories of each variable were plotted in line graphs showing changes across the 12 three-minute segments of the task. Following MacIntyre and Serroul (2015), "dips" and "spikes" were identified in learners' ratings for each task performance. A "dip" was defined as a decline of three or more rating points during the task and a "spike" by an increase in three or more points during the task. Participants' engagement patterns fell into four categories: a moderately steady pattern (no dips or spikes), an increasing pattern (only spikes present), a decreasing pattern (only dips present), or a rollercoaster pattern (both dips and spikes present).
Though not the focus of this research, it was considered worthwhile to establish the relationship between focus and interest before presenting the results. To do this, Pearson correlations were calculated between interest and focus scores for the 12 three-minute intervals. With the alpha set at .05, results indicated significant correlations for 10 out of 12 intervals in video-chat (.51 < r < .79) and 9 out of 12 intervals in text-chat (.51 < r < .79). Non-significant relationships occurred during the 24-36-minute period for both video-chat (.33 < r < .47) and text-chat (.31 < r < .43), suggesting a close relationship between focus and interest (Csikszentmihalyi et al., 2005;Hidi, 1990), which weakened slightly towards the end of the task period. Table 3 provides a summary of the descriptive statistics for focus and interest ratings averaged across the 16 participants. This is represented visually in Figure  4. Wilcoxon signed rank tests revealed that there were significantly higher focus scores reported in the video-chat condition, with a large effect size (Z = 2.87, Nvideo-chat = 16, Ntext-chat = 16, p < .01; r = 0.72), and significantly higher interest scores in the video-chat condition, with a large effect size (Z = 2.30, Nvideo-chat = 16, Ntext-chat = 16, p = .02; r = 0.58). The only time interval both focus and interest were not rated higher for the video-chat condition was for the last three minutes of video-chat tasks where learners in the text-chat condition seemed to experience an increase in both measures.

Focus (video-chat) Focus (text-chat) Interest (video-chat)
Interest (text-chat)  Table 4 shows the results of the content analysis on stimulated recall interviews, which indicate that engagement was influenced by learner factors, task design factors, task process factors, and task condition factors. For engagement during text-chat tasks, task condition factors and task process factors were the largest contributors. Reasons for increasing or decreasing engagement related to communication mode were most frequent, with positive comments referring to a reduction in communication anxiety (e.g., P3: "chatting in the chat box is less stressful"), use of familiar abbreviations to communicate (e.g., P1: "We use texting… always use short versions of English, so it is quite easy for me"), and the permanent record of text-based interaction (e.g., P8: "I lost memory, so I double check by looking back at the chat box"). Overall, however, the text-chat communication mode had a more negative impact on engagement than the video-chat communication mode. Reasons for decreases in engagement related to the perceived inefficiency of text-chat interaction (e.g., P10: "Text is okay, but sometimes it-we will mix up with the ideas. Uh-that means waste more time"), with several references to not noticing messages or feeling confusion due to overlapping, non-contingent messages (e.g., P6: "I am not finished point one, but he moves to point two. And then we miss two points together"). Similar to text-chat, task condition factors and task process factors were the largest contributors to video-chat engagement. However, the video-chat communication mode seemed to facilitate greater collaboration and idea generation than text-chat (e.g., P2: "Why my interest at the beginning is so high because I can discuss with [my partner] a lot so it is fun"), with several positive comments related to the immediacy of communication (e.g., P7: "it's quicker for us to talk on uh-in front of the camera, like it's talking real life") and the intimacy of communication due to the presence of visual cues (e.g., P11: "I can guess her emotion or her uh-through her face"). A notable factor that suppressed engagement in video-chat was the transition between task stages, which, for some participants, occurred when learners reduced their speaking production to focus on writing (e.g., P13: "Changing to writing now… it is difficult"). Learner factors and task design factors were less influential. A lack of perceived English proficiency (e.g., P15: "It's difficult because I'm not a good speaker") accounted for some decline in video-chat engagement, while task interest and familiarity accounted for high initial engagement in the task (e.g., P10: "I start high interest because I know the topic") for both modes.
Examining the dips and spikes in interest and focus throughout the task, individual participants displaying each of the four engagement patterns are shown in Table 5. Figure 5 visually shows examples of each pattern type.  Most engagement trajectories fell into the moderately steady pattern category. However, this pattern was seen more in video-chat performances (69% of focus patterns; 63% of interest patterns) than in text-chat tasks (38% of focus patterns; 50% of interest patterns). For example, P16 increased her focus slightly during the 0-9-minute stage of the video-chat task from 3 to 4, which she attributed to an initial period of understanding the task and generating ideas, then maintained a high focus score of 4 for the rest of task ("once you understand and generate ideas, you became more focused and-and concentrated. It lasted me") (see Figure 5). Within this pattern, high steady engagement was most common, which often increased (or started high) during the initial stages of the tasks.
The second most populated category was a decreasing pattern of engagement, with approximately equal proportion of this pattern found in video-chat task performances (25% of focus patterns; 25% of interest patterns) and textchat performances (19% of focus patterns; 25% of interest patterns). Trajectories were either a slow or sudden decline. P15 is an example of the latter (see Figure 5). P15 starts the video-chat task with a focus score of 4, which decreases to 2 at the 30-minute mark, before dropping to -3 in the last 6 minutes of the task. In her case, the initial decline was related to a perceived lack of proficiency ("I can't speak well . . . it's always a problem for me"), followed by a steep decline related to finishing the task early ("I think I finished, and I drop uh-lost focus and interest very significantly"). A high initial interest in the task, triggering focused attention, and a lack of perceived proficiency, causing decreases in engagement, were common reasons for this pattern category.
Thirdly, increasing patterns of engagement were mostly confined to textchat experiences (31% of focus patterns; 6% of interest patterns), with only one pattern found in video-chat task performances (6% of focus patterns; 0% of interest patterns). Most increasing patterns were for focus, such as for P7, who maintained a score of between 1 and -1 during the 0-27-minute portion of the task, but then increased to 3 in the final 9 minutes (see Figure 5). P7 attributed this increase to a lack of initial understanding ("I'm not sure how should I do it"), successful collaboration ("as I discussed with her and I . . . we're starting to find out a plan") and a rush to finish due to time constraints ("time is out soon so I concentrate"). Overall, this pattern seemed to be associated with a feeling of urgency to finish, causing focus to increase, and use of language and ideas written in the text-chat box to assist with composition writing.
Finally, the rollercoaster pattern was found in more text-chat trajectories (13% of focus patterns; 17% of interest patterns) than video-chat trajectories (0% of focus patterns; 13% of interest patterns). Most of the participants in this category experienced a declining pattern but then rapidly increased their engagement at a certain moment in the task. For example, P11 experienced a sudden change in trajectory at 27-30 minutes during the text-chat task (see Figure 5). She attributed this dramatic change to discovering a valuable idea that was written by her partner in the text-chat ("Because I suddenly think something I can add in my part, because I saw my . . . I saw that I miss my partner's message in uh-in the right corner, so I need to finish my new part") and the transition to editing her partner's paragraph in the final minutes ("I need to check my partner's part, so I have uh-maximum focus uh-reversed"). Sudden interruptions in communication due to overlapping messaging, or breakthroughs, caused by retrieval of language or ideas from chat records, were characteristics of the rollercoaster pattern.

Discussion
The first research question asked whether there were group-level differences in learner engagement when computer-mediated CW tasks were performed in text-chat and video-chat conditions. The results revealed that learners' cognitive-affective engagement when aggregated over 12 three-minute segments, was significantly higher in the video-chat mode than the text-chat mode for both focus (p < .01; r = 0.72) and interest (p = .02; r = 0.58). The elevated level of focus for the video-chat task is consistent with research that suggests focused attention is optimized during L2 multimodal tasks when multimodal input is integrated (e.g., synchronous audio/video communication) (Al-Shehri & Gitsaki, 2010;Guichon & McLornan, 2008). Furthermore, the combined higher focus/interest scores suggest that video-chat tasks may generate interest as a by-product of heightened focus as learners experience feelings of challenge, automaticity, and enjoyment (Chen, 2001;Csikszentmihalyi et al., 2005;Egbert, 2003). Regarding group-level changes in engagement, although a slight decreasing trend in the video-chat condition could be observed, variation in both task conditions was minimal (less than a 2-point variation on a 10-point scale, see Figure 4). This is in line with Guo et al.'s (2020) finding that learners' effort and enjoyment fluctuate minimally during learning tasks when measurements are averaged across learners. Such little variation may indicate that group-level analyses obscure the more extreme fluctuations at an individual level, which is of interest in complex dynamic systems research (Larsen-Freeman & Cameron, 2008) and discussed in relation to the third research question.
The second research question asked what factors influenced changes in engagement in each task condition. Learner factors, task design factors, task process factors, and task condition factors were identified as perceived influences. Task condition factors, such as communication mode, and task process factors, such as collaboration and idea conceptualization, were most prominent in accounting for changes in engagement in both tasks. Consistent with previous research, reasons for reduced engagement in the text-chat communication mode were related to the time-consuming nature of text-chat interaction (Zeigler, 2016) and non-contingent turn-taking due to overlapping messages (Lai et al., 2008;Loewen & Wolff, 2016). As messages were sometimes reported to be delayed and out of sequence in text-chat mode, there was likely a split-attention effect (Mayer & Moreno, 1998), whereby learners needed to expend additional attention "untangling" overlapping messages, leading to cognitive overload, anxiety, and reduced task engagement. However, increases in engagement were frequently attributed to learners' use of the chat record as an ideational and linguistic resource, which may have facilitated productivity during writing. The straightforward process of transforming chat resources into resources for composing has been documented in pre-task planning research (Liao, 2018) and is uniquely beneficial to learners performing computer-mediated CW tasks. Overall, learners were less divided in their perception of the video-chat communication mode, with results indicating that engagement derived from substantial collaboration and idea conceptualization. As previous CW studies have indicated enhanced collaboration in FTF mode compared with text-chat mode (Kessler et al., 2020;Liao, 2018), it is possible that video-chat closely approximates FTF interaction in terms of the overall collaborative experience. In support of this claim, reports from several learners indicate that video-chat seemed to facilitate "social presence" to a greater degree as linguistic (verbal) and extralinguistic (visual) information were used to establish feelings of immediacy, intimacy, and sociability (Chamberlin Quinlisk, 2008;Yamada & Akahori, 2009). For both modes, task design factors and learner factors seemed to mediate engagement to a moderate extent. Consistent with task-based research (e.g., Aubrey, 2017aAubrey, , 2017bDao & Sato, 2021;Qiu & Lo, 2017), task interest and familiarity played a facilitating role in engagement.
The third research question asked whether there were common patterns of fluctuation in engagement during the tasks. Four distinct patterns emerged from the data, with just over half of the participants' engagement exhibiting little variation (i.e., moderately steady). The remaining engagement trajectories exhibited considerable variation (i.e., increasing, decreasing and rollercoaster patterns), with some participants reporting extreme changes from very positive to very negative interest and focus (e.g., P11 reported a 6-point increase in a 9minute period, see Figure 5). This is consistent with previous research that has found affective variables (e.g., anxiety, enjoyment: Boudreau et al., 2018) and conative variables (e.g., willingness to communicate: MacIntrye & Gregersen, 2021;MacIntyre & Serroul, 2015) to be susceptible to considerable change within short speaking tasks. Admittedly, compared to these studies, which provided ratings on a per-second basis, the larger 3-minute rating interval in this study may have masked micro-fluctuations within each segment; however, measurements were sufficiently nuanced to the extent that it was possible to identify patterns that were not seen when engagement was averaged across all learners (see Figure 2). A notable observation was that more video-chat task performances exhibited high and moderately stable engagement trajectories, while text-chat performances had more variable and increasing trajectories (see Table  5). This might suggest a trade-off effect in how the two modes impact engagement during CW tasks. That is, video-chat mode may afford learners a better environment to collaboratively generate ideas, thereby sustaining focus and interest during planning; on the other hand, the text-chat mode, which requires learners to permanently record written ideas during interaction, can facilitate more focused production during composition writing. At the same time, however, there is evidence to suggest that engagement-inhibiting learner factors (e.g., perceived lack of proficiency) can override task condition influences, leading to declining engagement regardless of mode for some learners (see P14 and P15, Table 5). Taken together, these findings highlight the complex relationship between learner internal and learner external factors and their combined influence on engagement during computer-mediated CW tasks.

Implications
Importantly, this research addresses the pedagogical issue of how teachers should implement computer-mediated CW tasks. Video-chat tasks demand more attention and generate a higher level "social presence," which may benefit advanced learners who are comfortable with the quicker pace of spoken collaboration and/or who are familiar with their interlocutor(s) and thus value intimacy in communication. The video-chat mode may also be suitable for learners who complete tasks that require ample idea conceptualization (e.g., problemsolution tasks). However, the slower pace of text-chat may benefit learners who are less fluent, and the permanent nature of the chat script may provide a scaffold through which learners can refer to planned ideas and language as they compose their writing. If teachers decide to combine communication modes during a single task, they might consider doing so in a principled manner and in line with engagement needs of the writing process. For example, teachers might begin with an initial stage using video-chat which focuses on task understanding and idea generation, followed by a text-chat stage in which learners encode their ideas in the written language via interaction before they begin writing their joint composition. Such integration would take advantage of the relative benefits of each communication mode while gradually diverting learners' attention from planning ideas and language to drafting and editing, thus optimizing learners' cognitive-affective engagement throughout the task.

Conclusions
This study has compared a group-level analysis and a dynamic-level analysis of individual learners' engagement when Hong Kong learners of English completed computer-mediated CW tasks in video-chat and text-chat mode. In terms of methodology, this research represents the first foray into engagement dynamics during computer-mediated CW tasks, an approach that has been encouraged by scholars (e.g., Hiver et al., 2021). In contrast to previous studies in computermediated CW tasks that have looked at observable engagement via peer interaction (e.g., Hsu, 2019;Li, 2013), this research has probed into the more invisible cognitive-affective factors using self-reported measures of focus and interest. It has revealed that learners were significantly more engaged in video-chat than text-chat tasks overall, which can be primarily attributed to the more immediate way in which learners can process multimodal information during interaction to collaborate and plan ideas for the task, requiring high levels of focused attention and generating elevated levels of interest. Examining individual dynamic patterns of engagement, findings suggest that the text-chat mode produces more increasing patterns as learners initially struggle with the less efficient communication mode but then experience more focused writing in the latter stages of the task as they draw on pre-planned ideas in their chat records. In sum, this study contributes to our knowledge of the relative benefits of different computer-mediated modes of interaction during CW and provides further evidence that engagement is dynamic in various kinds of L2 tasks.
The limitations of this study should be highlighted. First, as this research examined the cognitive-affective aspect of engagement, we did not investigate the behavioral dimension of engagement (e.g., collaborative discourse or writing behavior). Although some studies have looked at CW behavior at different stages of longer multi-week writing projects (e.g., Kessler & Bikowski, 2010), it would be a novel approach to examine changes in observable written and/or interactional behavior on a shorter timescale (e.g., per-minute). Combining this with self-report ratings of engagement would shed light on the relationship between different dimensions of engagement (i.e., behavioral, cognitive, affective, social), which remains underexplored (Philp & Duchesne, 2016). Similarly, writing outcomes (i.e., fluency, accuracy, complexity) were not analyzed. A processproduct approach (Long, 2015), in which engagement (i.e., the process) is related to resultant writing quality (i.e., the product), may highlight which patterns of engagement are desirable for facilitating optimal writing outcomes. Finally, regarding engagement ratings, having learners rate their focus and interest at three-minute intervals was deemed suitable considering the longer duration of CW writing tasks (as opposed to shorter speaking tasks). However, future research may consider employing ratings at smaller timescales (e.g., per-second) using idiodynamic computer software (e.g., Boudreau et al., 2018;MacIntyre, 2012), which may capture important momentary changes.