Studies in Second Language Learning and Teaching

This study compares the processing of three different types of written correc-tive feedback (WCF) by heritage language (HL), second language (L2), and third language (L3) learners who wrote and revised three short essays and received a different type of WCF for each essay (i.e., direct, coding, or under-lining). Comparison of pre-and post-feedback texts and analysis of think-alouds served as the basis for determining whether one type of feedback promoted higher depth of processing (DoP) and whether this processing was mediated by error type and language background. The findings indicate that feedback type did interact with DoP, and that this interaction was in some ways mediated by learner background and error type. This research serves as a first step toward understanding how these three learner groups are impacted by these commonly used feedback types and is therefore important to drive evidence-based pedagogical decisions .


Introduction
Written corrective feedback (WCF) is "any explicit attempt to draw a learner's attention to a morphosyntactic or lexical error" (Polio, 2012, p. 375) in writing. The focus of WCF research has traditionally been on the accuracy of the final written product, but in recent years the processes learners engage in when they receive feedback have begun to receive empirical attention based on the recognition that without information about how WCF was processed, it is difficult to ascertain anything more than whether a revision was made (see Roca de Larios & Coyle, 2022, for a recent review). Leow's (2020) feedback processing framework is the most recent account of how engaging with WCF may (or may not) lead to changes in learners' linguistic systems. Specifically, the framework explains that once learners have received feedback, they must minimally pay attention to it for it to potentially be converted into intake. Feedback processing pertains to how the learner cognitively processes the feedback in relation to current linguistic knowledge. If the feedback is further processed, whether with a low or high DoP, there is potential for previously learned knowledge to be restructured. The new restructured information (which might or might not be fully accurate at this point) can then replace or combine with the original knowledge in the learner's internal system. Leow notes that it is possible for the learner to retain both accurate and inaccurate items in the system, and only measures of delayed performance can indicate whether a complete, accurate restructuring has taken place (resulting in system learning), or whether such restructuring was fleeting, as evidenced by accuracy immediately after feedback provision but regression to previous inaccurate knowledge at a later time.
Our research is couched in Leow's (2020) framework and investigates the processing and impact of WCF on three different learner populations -second language (L2) learners, third language (L3) learners, and heritage language (HL) learners of Spanish in a university-level Spanish course in which composition is one component of the four-skills curriculum. These three populations are theoretically interesting, with heritage language learners understudied in WCF and L3 learners never having been the focus of prior WCF processing studies. Furthermore, examining the three populations is an ecologically valid choice since all of them are enrolled in the same classes. In what follows we first review different types of WCF and then move on to describing previous research findings on feedback processing and learner profiles as the necessary background to situate our study.

Types of WCF
WCF differs along several dimensions, including its focus (i.e., limited to a restricted set of linguistic elements or comprehensive), its timing (i.e., synchronous or asynchronous with writing), and its explicitness (i.e., direct or indirect). For our study we chose asynchronous, comprehensive feedback because of ecological validity considerations in our context and because the potential effectiveness of comprehensive WCF has attracted less empirical attention than focused WCF (Cerezo et al., 2019, p. 174). To benefit from implicit (indirect) feedback, learners need to have enough proficiency in the language to be able to understand the nature of their errors, whereas explicit (direct) correction can be suitable for beginners as the feedback provides correction (Kang & Han, 2022). We chose to include both direct WCF and two types of indirect WCF -underlining and metalinguistic coding -as they are commonly used in language classrooms (Zhang et al., 2021) and for their theoretical suitability for learners of intermediate proficiency. In terms of processing, as Kang and Han (2022) explain, "explicit WCF places the processing responsibility in the hands of the feedback giver (i.e., the teacher), while implicit WCF shifts the responsibility to the feedback receiver (i.e., the learner)" (p. 221). We therefore anticipated that direct feedback might be processed less deeply than indirect feedback.

Findings of previous studies on the processing of WCF
Previous research has used two main introspective methods to examine how learners process feedback -written languaging (Cerezo et al., 2019;Manchón et al., 2020;Manchón & Roca de Larios, 2011;Suzuki, 2012Suzuki, , 2017 and thinkalouds (Caras, 2019;DeRobles, 2019;Kim & Bowles, 2019;Park & Kim, 2019;Qi & Lapkin, 2001;Sachs & Polio, 2007;Suh, 2010). Leow and Manchón (2022) have convincingly argued for the affordances of think-alouds, stating that "thinkaloud protocols, although a time-consuming undertaking mostly due to transcription, segmentation, and coding, arguably elicit the richest data on cognitive processes employed during task performance" (p. 306). Our study therefore adopted think-alouds for this purpose and we turn now to reviewing existing process research that has used this data collection procedure.
The earliest studies to examine WCF processing using think-alouds are Qi and Lapkin (2001) and Sachs and Polio (2007), both of which involved ESL learners who received reformulations. Although neither used the metric of DoP per se, in Qi and Lapkin (2001) "substantive" rather than "perfunctory" comments were related to an improved written product and in Sachs and Polio (2007) errors accompanied by metalanguage or a hypothesis were more likely to be corrected in the revision. Park and Kim (2019), who examined processing of indirect feedback, similarly found that deeper processing resulted in better ability to self-correct marked errors. Where think-aloud studies comparing the processing of multiple types of WCF are concerned, direct feedback has been associated with lower levels of processing than reformulations and indirect feedback (Caras, 2019;Kim & Bowles, 2019). Suh (2010), however, found in her study that the intermediatelevel ESL learners who received direct feedback manifested higher levels of awareness and showed significantly greater learning gains than those who received indirect feedback (metalinguistic coding) on the past counterfactual conditional.
It seems that the relationship between feedback and its processing may depend on a number of factors, including the L2 writer's proficiency level, the nature of the errors, and the extent to which the indirect feedback provides clues as to the nature of the error. For instance, Caras (2019) found that for preterit/imperfect errors, but not for ser/estar errors, most L2 Spanish participants who received metalinguistic feedback processed at a high level. Looking further into the relationship between error types and DoP, Kim and Bowles (2019) found that errors related to sentence structure and organization were associated with high DoP, whereas punctuation, word choice, and tense (surface level errors) were associated more often with low DoP. For Caras' beginning-level learners, the indirect feedback type that provided the fewest clues as to the nature of the error (crossing it out) generally led to low level processing, most likely because learners did not have enough target language knowledge to understand what was incorrect about their writing. In contrast, the indirect, metalinguistic coding feedback generally led to high level processing, presumably because learners relied on the code to identify the source of the error.
As this brief review shows, our understanding of the factors that impact WCF processing for L2 learners is just beginning to emerge and further research is needed to disentangle the roles learners play. All of the aforementioned studies examined how L2 learners of different languages, contexts, and proficiencies processed WCF. Other populations have featured less prominently in research. For instance, just one study (DeRobles, 2019) investigated how heritage learners process WCF. This study compared how high and "very high" proficiency learners processed direct and indirect (metalinguistic coding) feedback. Deeper processing was associated with greater accuracy between drafts, particularly for the metalinguistic coding group. Again, relationships were not straightforward, as direct feedback led to improved accuracy for some lower proficiency learners on certain error types, and proficiency seemed to mediate the relationship between WCF and DoP.
As studies of WCF processing are still scarce for HL learners and, to our knowledge, nonexistent for L3 learners, it is important to compare the feedback processing behavior of those learners who so often coexist in the foreign or second language classroom yet have diverse abilities and needs due to their unique linguistic backgrounds, which we discuss next. This expansion of the populations under study will likely contribute not only to expanding the empirical basis, but also to assessing whether available research findings can be generalized across populations and contexts.

HL, L2, and L3 learners
US college and university language classrooms are more diverse than ever (Association of American Colleges and Universities, 2019), with Spanish classes particularly so because it is the most widely taught language other than English in the country (Looney & Lusin, 2019). In many classrooms, L2 learners of Spanish who were raised monolingually in English coexist with heritage learners of Spanish, defined by Valdés (2000) as learners that grew up hearing and using the language at home. At the institution where this study was carried out, there is also a growing number of L3 learners of Spanish, who were raised speaking a heritage language such as Russian or Urdu at home, are dominant in English (the majority language), and choose to learn Spanish in a classroom setting for professional or personal reasons.
Previous research tells us that these learner types have different strengths and needs when it comes to language learning (e.g., Bowles & Montrul, 2017;Cenoz, 2013;Gatti & O'Neill, 2017;Gurzynski-Weiss, 2010). HL learners often have limited knowledge of grammar rules and metalinguistic terminology since they acquire the language naturalistically at home and rarely receive literacy instruction in Spanish at school in the U.S. However, they often have stronger pronunciation skills, greater oral fluency, and higher oral comprehension skills than L2 or L3 learners (Carreira & Kagan, 2011;Mrak, 2020). Past research has shown that HL learners tend to rely on their "ear" even when completing written tasks, often reading their writing and any feedback aloud to determine whether it "sounds right" (Yanguas & Lado, 2012;Zamora, 2022). Furthermore, their relative lack of familiarity with metalinguistic labels suggests they might be less able to benefit from indirect feedback (and especially coding feedback) than their peers who learned Spanish primarily in a classroom setting. L2 learners are mainly exposed to the target language in the classroom, and therefore usually have stronger explicit knowledge of grammar rules than HL learners, but weaker comprehension skills and oral fluency (Carreira & Kagan, 2011). They tend to be quite used to metalinguistic terms and are often very accurate at using them to describe the Spanish language (Bowles, 2011). This tendency could make them more adept at using indirect feedback (especially metalinguistic error coding feedback) than their HL peers. L3 learners have the advantage of coming to the Spanish language classroom with two languages already under their belt, which, as past research has shown, contributes to higher overall metalinguistic awareness (Bialystok, 2001). This tendency, combined with their classroom exposure to Spanish, may predispose them to benefit the most of the three groups from indirect feedback.
Given their profiles, HL, L2 and L3 learners might be expected to process feedback differently and in empirically and pedagogically relevant ways. From the latter perspective, empirical findings could usefully inform our decisions about feedback provision, and in view of a limited number of studies with all but L2 learners, research is essential to drive evidence-based pedagogy for the three learner groups. In an attempt to add to previous research on feedback processing and gain new pedagogically relevant knowledge, the following research questions guided our study: 1. Is there a relationship between type of written corrective feedback (error coding, underlining, or direct correction) and DoP for HL, L2, or L3 learners? 2. Is there a relationship between DoP and error type for HL, L2, or L3 learners? 3. Is there a relationship between DoP and accurate error revision for HL, L2, or L3 learners?

Participants
The participants in this study were 35 students of Spanish from a large public university in the US Midwest. They ranged from low-to high-intermediate proficiency in Spanish as determined by an abridged version of the Diploma de Español como Lengua Extranjera (DELE), a Spanish proficiency test frequently used in linguistics research (e.g., Montrul 2004;Montrul & Slabakova 2003). The DELE contains 30 multiple-choice vocabulary questions and a 20-question cloze test of grammar and vocabulary. Test-takers can score 0-50 points, with scores from 0-29 regarded as low proficiency, 30-39 as intermediate and 40-50 as advanced.
The learners also completed the Bilingual Language Profile (BLP) (Birdsong et al., 2012), which was used to gather information about their language backgrounds and to identify them as HL, L2, or L3 learners. According to self-reports from the BLP, on average participants in the HL group began learning Spanish at birth and English at three years old and reported using Spanish with their families 55% of the time, and with their friends 10% of the time. The L3 learners spoke a wide variety of home languages including Mandarin Chinese, Hindi, Amharic, Japanese, Polish, Punjabi, and Ukrainian. Like the HL group, they reported learning English at three years old on average and using their home languages with family more than half of the time (68%) and with their friends about 30% of the time. L2 learners reported having been exposed only to English until learning Spanish in a classroom setting in adolescence, and they indicated they infrequently used Spanish outside of class. The learners were drawn from a range of university Spanish courses, with HL learners reporting having studied Spanish for an average of four years, L3 learners for 4.3 years, and L2 learners for an average of 5.5 years. In total, there were 7 HL learners of Spanish, 9 L3 learners, and 19 L2 learners. Table 1 reports average DELE scores for each group. There were no statistically significant differences in proficiency between groups.

Task
Each participant wrote and revised three 200-250-word essays in response to prompts about Spanish in the US (see Appendix). These prompts were chosen because they covered topics participants would have had personal experience with, were accessible to learners at this proficiency level, and related to topics covered in the curriculum. Participants were given a week to write each essay in their own time. While writing, they recorded their screens using Zoom to ensure they were only using Microsoft Word spellcheck, linguee.com and wordreference.com. These websites were allowed as resources because they provide translations of individual words or phrases like "get ready" rather than full sentences with context and examples. Participants submitted their screen recordings with each essay; essays without screen recordings were not accepted.
After submitting every single essay, each participant met with the researcher or a research assistant approximately one week later via Zoom to revise their initial texts. During each session, the participant received their marked-up essay with one of three WCF types (error coding, direct feedback, underlining) and shared their screen while they edited and thought aloud. The next prompt was sent 1-2 days after each revision session and the process was repeated a total of three times for each participant, with the order of prompts and WCF counterbalanced to ensure that any differences were not due to order effects or prompt attributes. For each revision session, one-third of the participants received each prompt and feedback type. Due to attrition, the final counts are not perfectly balanced. Table 2 below illustrates the number of participants for each prompt and feedback condition during each writing session.

Feedback types
The three feedback types that were chosen -direct corrective feedback, metalinguistic error coding, and underlining -were used because they are the main types of feedback that are given in the classroom and because they have been the focus of many previous studies (Zhang et al., 2021). Table 3 shows the same sentence marked with each feedback type. Muchas personas saben que aprendiendo aprender un lenguaje cuándo cuando ellos son jóvenes es más fácil que aprendiendo aprender una lengua en la escuela secundario secundaria . (Many people know that learning a language when they are young is easier than learning a language in high school.) Coding Muchas personas saben que aprendiendo(VT) un lenguaje(WC) cuándo(SP) ellos son jóvenes es más fácil que aprendiendo(=) una lengua en la escuela secundario(AGR).
Following Caras' (2019) claim that comprehensive WCF is ecologically valid since it reflects the way instructors tend to revise compositions in the classroom (p. 186), we elected to provide this type of feedback. The errors that were marked were article errors (including missing articles and issues with definite/indefinite articles), inflectional morphology, such as gender and subject verb agreement, mood, punctuation, prepositions, spelling errors, and sentence structure errors such as incomplete sentences, verb tense, and word choice. These error types were chosen because they encompass the majority of errors that language learners at this level make when writing.
To ensure the essays were being coded reliably, both authors provided feedback on 10% of the essays, divided equally among the three types of WCF. Interrater agreement on these essays was 94%, which was considered high enough that one author then coded the remaining essays.

Think-aloud protocols
The think-aloud protocol was designed following Bowles (2010), beginning with instructions and think-aloud practice for participants before they moved on to the revision task. The instructions asked learners to think aloud in English, Spanish, or a combination of the two as was most natural while they revised, saying what went through their mind as they reviewed their feedback. Next, the participants completed a brief think-aloud practice activity made up of three sentences with grammatical errors in English that had been marked with the type of feedback that corresponded to that day's session. Examples of each are included in Table 4. They're wasn't any students I knew at the school event.
Example think-aloud comments Ok, so I have an issue with "they're." Maybe I spelled it wrong since there are many ways to spell it. And a problem with wasn't, too. Maybe it is supposed to be "weren't" since there is more than one student? Practice sentence He didn't went to the game yesterday.
After the think-aloud practice session, participants received their markedup essay, shared their screen, and began to revise. The researcher started recording the Zoom session, turned off her video and muted her microphone, only speaking to remind the participants to continue to think aloud if they fell silent for more than a few seconds. This prompting was rarely necessary. Participants spent on average 15-20 minutes revising each essay.

Coding for DoP
The think-aloud recordings were transcribed verbatim. Participants' comments for each error were classified as high, medium, or low DoP, adopting the coding scheme used by Caras (2019), an adaptation of Leow's (2015) coding system. Table 5 provides descriptors for each category in the coding system and Table 6 contains examples of each type of DoP in each feedback condition.  And then. Americana. Yeah, porque I was talking in the the feminine primera generación cause generación is feminine, so I have to change this to an "a." Ooh (Direct). Aquellos que son bilingües están mejor enfocan. (Pause). Están . . . Then this one probably too. Okay, they practice. Wait and so, those who are bilingual are better at concentrating because they practice. That should be practican. Pra-ti-can. Conjugate that. Because they practice. I think that's right, okay. Mejor . . . están . . . aquellos que son bilingües están mejor enfocan (Underlining).
Each researcher coded 10% of the transcripts separately and interrater agreement for DoP was 94%. The two authors discussed the disagreements and further elaborated the coding scheme accordingly, then one author coded the rest of the data. Each think-aloud comment was coded for high, medium, or low DoP, error type, and whether the revision was correct, incorrect (but changed from the original), or unchanged. In total, there were 1,910 comments across all three feedback conditions. Each participant provided an average of 54 thinkaloud comments (SD = 27). Descriptive statistics were used to answer the research questions in the following section.

Underline
Low 14 (67%) 12 (36%) 3 (25%) 0 (0%) 8 (40%) 20 (45%) 1 (14%) 3 (19%) 8 (22%) Med 4 (19%) 7 (21%) 5 (42%) 0 (0%) 6 (30%) 13 (29%) 2 (29%) 4 (25%) 12 (33%) High 3 (14%) 14 (42%) 4 (33%) 0 (0%) 6 (30%) 11 (25%) 4 (57%) 9 (56%) 16 (44%) Note. ART = article, IM = inflectional morphology, M = mood, P= punctuation, PR = preposition, S = spelling, SS = sentence structure, T = tense, WC = word choice The second research question asked if there was a relationship between DoP and error type for the three groups. Tables 10-12 include detailed data for each learner group. Direct corrective feedback led to the lowest DoP for most error types for all groups. That said, the HL group processed the few punctuation errors they made most deeply in the direct feedback condition, often spending considerable time reading and re-reading the sentence aloud to see if the corrected punctuation sounded right. The L2 learners processed article and preposition errors with equally low DoP in every condition, most often editing them by simply saying the correction aloud (e.g., "el país" instead of just "país," or "por" instead of "para"). Underlining feedback promoted the highest DoP for the majority of error types for all groups, followed closely by coding feedback for L2 learners. Specifically, L3 learners processed underlining feedback with the highest DoP for eight of the nine error types, the only exception being article errors, which were processed most deeply with coding feedback. L2 and HL learners demonstrated the most medium and high DoP for inflectional morphology, mood, preposition, and spelling errors when they received underlining feedback. The L2 learners seemed to benefit from the error coding feedback more than the other groups for punctuation, sentence structure, tense, and word choice errors as this feedback type promoted the highest DoP for these error types, most likely because it indicates the source of the error, something that might not be clear from underlining for these errors, especially to L2 learners whose knowledge is more rule-based than intuitive and who do not have heightened metalinguistic awareness like the L3 learners.   The third research question concerned the relationship between DoP and error revision for each group. Tables 13-15 include the percentages of accurately corrected, unchanged, and inaccurately revised errors in each group by feedback condition. Learners from all three groups were able to accurately revise approximately 80% of their errors, regardless of DoP or feedback type and rarely left errors unchanged. Unsurprisingly, the direct feedback condition allowed learners in each group to revise many errors correctly with low DoP. When errors were left unchanged, they were most often processed with low DoP, either because the learner simply skipped the error or took a quick look and decided they did not know how to correct it. Few errors were left unchanged by any group in the direct feedback condition since it provided learners with the correct revision. In each group roughly 15% of errors were incorrectly revised. This mostly occurred when the learners demonstrated low or medium DoP and happened most often in the underlining and coding feedback conditions. Notably, there were some instances in each group of high DoP leading to inaccurate revisions, meaning that higher DoP did not always lead to accurate revision and conversely, medium or high DoP was not always required for accurate revision. Correct revision accompanied by high DoP occurred most often in the underlining feedback condition for HL and L3 learners and in the error coding condition for L2 learners. Based on Leow (2020) and past research on the relationship between high DoP and subsequent learning outcomes (Leow, 2015), these are instances that have the greatest potential for leading to learning gains.
While high DoP does not always lead to correct revision and correct revision does not necessarily require high DoP, in general deeper processing has been found to be associated with more correct revision (Park & Kim, 2019). Since direct feedback provides learners with the correct answer, when we focus only on the error coding and underlining conditions, we see that higher DoP does appear to be linked more with accurate revisions and lower DoP is associated with unchanged or incorrectly revised errors. This pattern was clear for all three groups, where between 57-67% of the correctly revised errors were processed with medium or high DoP, 60-80% of the unchanged errors were processed with low DoP, and approximately 50% of the inaccurately revised errors in each group were processed with low DoP. Only the L3 learners in the underlining feedback condition broke this pattern as they were equally likely to process their incorrectly revised errors with low, medium, or high DoP.

Discussion
In this study we set out to explore the processing of three different types of written corrective feedback by HL, L2, and L3 learners with the ultimate aim of contributing new data on feedback processing from diverse learners of Spanish that could be useful empirically (in terms of the generalizability of findings on feedback processing across populations and contexts) and pedagogically (especially in terms of pedagogical decision-making related to the provision of feedback).
Our first research question asked if there was a relationship between type of WCF and DoP for learners. Our results confirm that feedback type and depth of processing do interact. Across all groups, direct corrective feedback, the most transparent feedback type, led to the lowest DoP. Error coding and underlining feedback promoted medium or high DoP more than half the time for every learner group. These findings echo previous research which has found that indirect feedback tends to promote higher DoP so long as the learners have sufficient proficiency to understand the feedback (Caras, 2019;Kim & Bowles, 2019). The HL and L2 learners in this study had similar rates of high, medium, and low DoP for each feedback type, while the L3 data present very distinct patterns. That is, the L3 learners processed the error coding feedback much more shallowly than the other groups, only ever processing it with high DoP 17% of the time as opposed to 26% for HL learners and 32% for the L2 group. Conversely, they processed the underlining feedback more deeply than any other group. From our observations, the L3 learners were often able to resolve their errors very quickly upon seeing the error code, often simply stating the correct answer. Researchers have indicated that L3 learners have advanced metalinguistic skills due to their exposure to multiple languages (Bialystok, 2001), and it seems that the learners in this study may have been able to use such skills to their advantage with this feedback type. That said, due to the small sample size in this study, more data should be collected to determine the generalizability of this finding. Similarly, a larger sample size would be needed to confirm the general trend in our data that feedback type and depth of processing interact, and that this interaction is mediated by learner-related characteristics and background.
Our second research question concerned the relationship between DoP and error type. Overall, we found that error type interacted with DoP for these learner groups. Underlining feedback promoted the highest DoP for many error types including inflectional morphology, mood, prepositions, and spelling for all three learner groups, again echoing findings that indirect feedback promotes higher DoP and providing evidence of the way this can be mediated by error type for these learner groups. The L2 group demonstrated the highest DoP when they received error coding feedback for punctuation, sentence structure, tense, and word choice errors. Since L2 learners' linguistic knowledge is often more explicit and rule-based than that of HL learners, which is more implicit and intuitive, they may have benefited more from the direct metalinguistic guidance provided to them via the error coding system. The HL learners had the highest DoP for punctuation, tense, and word choice errors in the direct feedback condition. HL learners are known to use a "sound it out" approach when revising written work (Yanguas & Lado, 2012;Zamora, 2022), and that is what we saw happening in this case. Many HL learners dedicated significant amounts of time to reading and rereading what they had written, making sure the added punctuation or suggested new word sounded right, whereas other groups were more likely to simply add a punctuation mark or change the word and move on. What this would mean in terms of learning outcomes is unclear and should be studied in future research. These data provide evidence of interaction between error type and DoP, yet it is still difficult to draw clear conclusions from these results due to small sample sizes for some error types. Overall, it seems that feedback type and learner background mediated DoP to a greater extent than error type.
Our third and final research question asked about the relationship between DoP and error revision. We found that DoP and error revision did interact, with higher DoP leading to more accurate error revision for all groups in the underlining and error coding feedback conditions, which aligns with previous findings about more substantive processing and error revision (Park & Kim, 2019;Qi & Lapkin, 2001;Sachs & Polio, 2007). Underlining feedback promoted the highest DoP and correct revision for the HL and L3 groups, while error coding best achieved this for L2 learners. The L3 group was able to revise correctly with low DoP more often than the HL or L2 groups. Again, this often occurred because they were able to quickly identify their mistakes and fix them without deliberating. That said, accurate revision and high DoP did not always go hand in hand.
There were examples in every group and under every feedback condition of low DoP leading to accurate revisions and medium or high DoP leading to inaccurate revisions. These findings further reinforce the idea that writers' background and education experience are key variables to be considered when investigating feedback processing and its potential effects on the revised text produced. These insights also point to the necessary caution in generalizing findings across educational contexts and populations.

Conclusions and pedagogical implications
This exploratory study provides a first look at L3 processing of WCF and bolsters the amount of data we have on HL learners, a group that has been understudied to date. The analyses showed that DoP was mediated by feedback type for HL, L2, and L3 learners. Direct corrective feedback promoted the lowest average DoP for all three groups, while error coding and underlining promoted medium and high DoP more often. The HL and L2 learners demonstrated similar DoP patterns under each feedback condition, with coding and underlining promoting similar amounts of low, medium, and high DoP for each group, whereas the L3 group was unique in processing error coding with much lower DoP than the other two groups and underlining with high DoP more often. It seems that these learners, potentially due to their increased metalinguistic awareness, found it easier to understand their errors marked with error coding than the other groups and also had an advantage when it came to interpreting and applying the implicit underlining feedback.
Along with feedback type, error type played a role in DoP for each learner group. Overall, underlining led to the highest DoP for most error types for L3 learners. Inflectional morphology, mood, prepositions, and spelling were processed most deeply in the underlining condition by all learners, but the L2 learners processed punctuation, sentence structure, tense, and word choice errors most deeply in the error coding condition and the HL group processed punctuation, tense, and word choice errors most deeply upon receiving direct corrective feedback, often sounding out the suggestion to see if they agreed with it before editing. Finally, higher DoP was associated with correct revision and lower DoP tended to be related to leaving errors unchanged or revising them incorrectly, at least with coding and underlining feedback types. Direct corrective feedback led all three learner groups to correctly revise their errors with low DoP far more often than the other two types of WCF due to its transparent nature.
While this investigation was primarily exploratory in nature due to the small sample size, a few pedagogical implications can be drawn. When it comes to feedback processing, underlining feedback promoted the highest DoP for the majority of error types in all learner groups. The L3 learners especially processed this feedback type more deeply than the other learner groups, indicating that this feedback type is likely ideal for them if the goal is to promote deep processing. The L2 group benefited more than the other two groups from the metalinguistic clues provided via the error coding feedback, most likely because their linguistic knowledge is primarily explicit and rule-based. That said, the HL and L2 groups had similar rates of DoP in the error coding and underlining conditions, so both feedback types could be beneficial for them. Direct corrective feedback promoted the lowest DoP for all groups across most error types. However, HL learners often spent significant amounts of time sounding out the direct feedback to make sure they agreed with it, something that the other two groups did not do.
Some limitations that should be considered are the small number of participants in the HL and L3 groups and the lack of a measure of learning. Our data shed light on how learners process WCF and what they do when they revise under different conditions, but we cannot speak to what impact the WCF had on their learning. Leow (2020) explains that higher DoP can potentially lead learners to restructure their inaccurate linguistic knowledge, and it is important to include long-term measures of learning to be able to make conclusions about the impact of WCF on language acquisition.
Future studies of feedback processing should continue to compare these three learner groups. Descriptions of observed learner revision behavior would also be beneficial. In this study we found that HL and L3 learners' behavior while revising differed from that of their L2 peers in some ways. L3 learners often revised very quickly and correctly, making minimal comments about their errors yet seemingly understanding the feedback target. Students in the HL group often took time to read and reread the feedback aloud, sounding out their mistakes and the suggested solutions, even with direct feedback. As most DoP coding schemes used to date were designed with L2 learners in mind and do not account for these behaviors, additional data should be collected to determine whether adaptations to DoP coding schemes may be in order.