Cross-country comparison of EFL teacher preparedness to include dyslexic learners: Validation of a questionnaire

The aim of this study was to validate a 24-item TEPID (Teachers of EFL Preparedness to Include Dyslexics) scale measuring the beliefs of 546 pre-service and in-service teachers of English as a foreign language (EFL) across three countries (Cyprus, Greece, and Poland) on their preparedness to include learners with dyslexia in mainstream foreign language (FL) classes. Principal component analysis of the scale led to a two-factor structure, that is, knowledge and self-efficacy in implementing inclusive instructional practices with dyslexic EFL learners, and stance towards inclusion. The analysis of measurement invariance confirmed the generalizability of the TEPID across all subgroups and allowed valid comparisons between factor variances and covariances. The scale is a useful tool for investigating perceived teacher preparedness to include dyslexic learners and variables that influence TEPID, comparing the results across countries, and designing tailored pre-service and in-service training schemes on inclusion.

This study focuses on validating a self-report survey instrument measuring EFL teachers' beliefs on their preparedness to include English as a foreign language (EFL) dyslexic learners. Foreign language (FL) teaching is understood as learning an additional language in an instructed setting, in an environment in which that language is not used on a daily basis. Dyslexia is understood as a type of specific learning difficulties (SpLD). SpLD are not attributable to vision, hearing, motor disabilities, or intellectual impairment. Neither emotional issues nor environmental, cultural, or economic disadvantage cause SpLD. SpLD can be observed in students who experience difficulties in information processing and learning due to specific neurological functioning (Scanlon, 2013;Woodcock, 2020). Dyslexic learning difficulties have neurobiological and genetic traces and are linked to phonological processing problems which, in turn, can lead to inaccurate and/or non-fluent, slow reading as well as incorrect spelling (International Dyslexia Association, n.d.). Dyslexic readers are characterized by slow and inaccurate word-level decoding (APA, 2013). Poor automaticity of low-level reading skills in turn negatively impacts higher-level text comprehension (Perfetti, 2007). Along with word-level reading difficulties, dyslexic individuals demonstrate "underlying weaknesses in the areas of working memory, executive functioning (planning, organizing, strategizing, and paying attention), processing speed, and phonological processing" (Košak-Babuder et al., 2019, p. 53).
Dyslexia has been repeatedly shown to influence the learning of a FL, especially in terms of written and spoken input processing. Dyslexic individuals often face challenges in FL learning (reading in particular) across learning contexts and the majority would perform worse than their non-dyslexic peers on a number of tasks (Department for Education and Skills, 2005;Fazio et al., 2020;Kormos, 2017aKormos, , 2017bKormos, , 2020Kormos & Smith, 2012). This is due to a considerable overlap among the basic cognitive factors that elucidate variations in L1 and FL language and literacy outcomes. L1 skills constitute foundations for FL development (see Kormos, 2017a, for a review). Enhancing EFL teachers' awareness of how well they are prepared to offer effective inclusive teaching to EFL learners with dyslexia can be helpful in designing specialized training, and, in the long run, facilitating the inclusion of learners with dyslexia in the context of FL classroom instruction and learning support.

Literature review
Dyslexia is commonly associated with L1 phonological processing difficulties leading to below-standard print processing, which manifests itself in inaccurate and/or non-fluent and slow reading and spelling. Successful print processing requires the knowledge of letters and the possible sounds represented by each letter or letter cluster (sound-letter relations) as well as the ability to blend the sounds together to create words and to segment a word into its individual sounds in order to read or spell it (Hulme & Snowling, 2009). L1 cognitive factors have been proved to best account for individual differences in FL learning; in other words, there seem to be common cognitive reasons that determine low achievement in a FL and literacyrelated difficulties in L1 (Kormos, 2017a). Recent studies support the claim that individual differences in FL achievement reflect individual differences in L1 skills and provide evidence for the crosslinguistic transfer of L1 to FL skills. This means that students who are poorer at L1 decoding, and have reduced vocabulary range and lower spelling, writing, and language analysis skills, will demonstrate smaller achievement in FL classes. Students who have weaker L1 literacy skills will develop weaker FL literacy skills (Sparks, Patton, & Luebbers, 2019).
Since the underlying cognitive processes in L1 such as working memory, phonological processing, processing speed, and attention control seem to be linked to FL literacy development, dyslexic difficulties in L1 processing and acquiring L1 literacy-related skills often coexist with difficulties in FL literacy development (e.g., Kormos, 2017aKormos, , 2017bKormos, , 2020. However, evidence supporting the claim that struggling FL learners also experience learning problems in their L1, and that L1 literacy-related problems always surface in FL learning difficulties is mixed (e.g., Alderson, Nieminen, & Huhta, 2016;Borodkin & Faust, 2014a;Ferrari & Palladino, 2007). Students with weak FL skills (low-achieving) do not have to always be at risk of, or diagnosed as having, dyslexia . Both individuals with dyslexia and students with low proficiency in a FL show a weakness in L1 language phonological processing. However, some studies show similar characteristics in L1 phonological processing in these two groups of learners (Sparks, 2016;Sparks & Luebbers, 2018;Sparks & Patton, 2016), while other research findings indicate that the weakness in phonological processing in L1 in non-dyslexic low-achieving FL learners transpires in a reduced set of skills in comparison to individuals with dyslexia. Poor FL performance of FL low-achieving students without dyslexia may be a consequence of difficulties in L1 phonological processing that they experience; however, these difficulties tend not to influence their reading acquisition in L1 (Borodkin & Faust, 2014a, 2014bBorodkin, Maliniak, & Faust, 2017).
Being a FL student with dyslexia does not necessarily have to involve experiencing FL learning difficulties. Many FL dyslexic students can compensate for their reading problems and demonstrate at least average achievements at different educational levels, especially when supported with appropriate teaching practices (Nijakowska, 2010;Olofsson, Taube, & Ahl, 2015). Nevertheless, the accumulating research evidence confirms that many students diagnosed with dyslexia in their first language (L1) experience difficulties of varying severity in learning additional languages (Bonifacci, Canducci, Gravagna, & Palladino, 2017;Dimililer & Istek, 2018;Kormos et al., 2019;Łockiewicz & Jaskulska, 2016Łockiewicz & Jaskulska, , 2019Toffalini, Losito, Zamperlin, & Cornoldi, 2018;Ylinen et al., 2019;see Kormos, 2017asee Kormos, , 2020, for a review), which seems to be apparent in both instructed settings, where additional language/s are learned in the school environment, and in naturalistic settings, where additional language/s are acquired in the home environment (Geva & Wiener, 2014;Martin, 2013;Peer & Reid, 2016).
FL achievement alone cannot be treated as an indicator of dyslexia (SpLD), and not only dyslexic (SpLD) learners but also non-dyslexic FL low-achievers should receive appropriate support from their well-trained FL teachers. It might be expected that more intensive instruction should likely bring about an increase in achievement for many struggling students. However, dyslexic (SpLD) students might need more individualized instruction, depending on their individual pattern of cognitive strengths and weaknesses, to meet their specific learning needs (Hale et al., 2010).
High-quality FL teacher training includes sufficient and adequate instruction in content knowledge and content delivery strategies (i.e., intensity and duration of special support) to meet the needs of diverse learners, including dyslexic FL learners. This can foster positive attitudes towards inclusion, lead to mastery of specialized knowledge and higher levels of teachers' self-efficacy and student advocacy. This in turn boosts teacher confidence in choosing and exploiting instructional practices that are inclusive (e.g., Chao, Forlin, & Ho, 2016;Coady et al., 2016;Das, Gichuru, & Singh, 2013;Florian, 2012;Florian & Rouse, 2009;Indrarathne, 2019;Peebles & Mondaglio, 2014;Sharma & Nuttal, 2016;Sharma & Sokal, 2015;Symeonidou & Phtiaka, 2014;Woodcock, 2020). However, both pre-service and in-service teacher training through professional development courses on dyslexia and inclusion offered to teachers working in instructed EFL settings in the European context tend to be insufficient. EFL teachers report that they are poorly prepared to face the challenges and demands of inclusive classrooms in terms of knowledge and skills, which can generate concerns (Nijakowska, 2014;Nijakowska & Kormos, 2016). Offering sound teacher training in dyslexia and inclusion for EFL teachers is very important given that the international prevalence of dyslexia is between 5-10% of the student population (Nijakowska, 2010). Another equally crucial reason is that English is an opaque language. This means that there are many representations for pronunciations of print patterns and many spelling versions for one sound (Moats, 2020), which can present a significant challenge for FL dyslexic learners who struggle making sense of letter-sound relationships because of their learning difficulty (Cessar, Treiman, Moats, Pollo, & Kessler, 2005).
As indicated by research evidence, inclusive instructional practices are more readily and successfully employed by more self-efficacious and less anxious teachers, who hold positive beliefs and attitudes towards inclusion (e.g., Sharma & Sokal, 2016). Teachers' awareness of inclusive practices coupled with knowledge of effective intervention programs and their theoretical underpinnings determine the level of teachers' preparedness to teach in an inclusive way (e.g., Kahn-Horwitz, 2015McCutchen et al., 2002;McCutchen, Green, Abbott, & Sanders, 2009;Podhajski, Mather, Nathan, & Sammons, 2009). EFL teachers' languagebased content knowledge constitutes the foundation of their professional preparation and allows successful teaching to students who experience reading difficulties. This knowledge involves language and literacy concepts, phonological and orthographic awareness, explicit reading instruction and phonics (Vaisman & Kahn-Horwitz, 2019). The above-mentioned knowledge of inclusive classroom practices, language-based content knowledge, knowledge of dyslexia and its manifestations in language learning, in turn, constitute a prerequisite for offering proper instruction to students with dyslexia (Aladwani & Al Shaye, 2012;Indrarathne, 2019;Moats, 2009;Washburn, Joshi, & Binks-Cantrell, 2011a, 2011b. Research findings confirm that poor teacher content/specialist knowledge can be, at least to a certain extent, linked to inappropriate and/or limited initial and in-service teacher training (e.g., Goldfus, 2012;Joshi et al., 2009). On the other hand, adequate, research-based teacher professional training can be instrumental in increasing the necessary language-based content knowledge (i.e., knowledge of basic language constructs) in both L1 and FL teaching contexts (e.g., Kahn-Horwitz, 2015Podhajski et al., 2009;Vaisman & Kahn-Horwitz, 2019). EFL teachers' insufficient knowledge on how students with dyslexia learn languages and on inclusive education principles and practices, as well as the unavailability of sufficient and appropriate pre-and in-service training opportunities may exert a substantial impact on teachers' beliefs about their preparedness for inclusion and may demotivate them to provide dyslexic students with high-quality teaching.
In addition to specialized knowledge, teacher perceptions of preparedness for inclusion can also be improved by fostering positive teachers' attitudes towards inclusive education (Hsien, Brown, & Bortoli, 2009). Conversely, inadequate preparedness may lead to negative beliefs about inclusion (e.g., Das, Kuyini, & Desai, 2013). Teachers' self-reported perception of the degree to which they feel prepared to provide inclusive instruction influences their beliefs about how effective they can actually be in the inclusive classroom. These selfefficacy beliefs relate to teachers' perceptions (rather than their actual behavior) and assessment of how well they can perform in the classroom to promote dyslexic students' engagement, learning outcomes and achievements (Tschannen-Moran & Woolfolk Hoy, 2007). Teachers' perceptions of their ability to teach in an inclusive way, their attitudes towards inclusion and their behavior in the classroom are related. The stronger their belief that they possess the skills necessary to teach in an inclusive classroom, the greater are teachers' effort, commitment to teaching, and flexibility in handling difficulties (e.g., Ozder, 2011;Takahashi, 2011). Importantly, self-efficacy beliefs can also be modified by appropriate teacher training (Borg, 2011;. Teacher self-efficacy beliefs are also crucial in that they can influence students' self-efficacy beliefs, motivation to learn and academic achievements (e.g., Guo, Connor, Yang, Roehrig, & Morrison, 2012).
In sum, including learners with dyslexia in mainstream classrooms may pose a number of challenges to teachers. EFL teacher preparedness for inclusion is a crucial issue, as it can exert a substantial impact on the way the needs of EFL learners with dyslexia are accommodated. However, the concept of EFL teacher preparedness has not been sufficiently addressed by research in FL teaching contexts and its constituent elements have not been verified. Against this background, the purpose of the present study was to design and validate an instrument that could gauge the preparedness of EFL teachers to include students with dyslexia in mainstream EFL classrooms. The study involved the design and piloting of the TEPID (Teachers of EFL Preparedness to Include Dyslexics) scale, then tested its factor structure and compared the yielded solution across three countries with different educational systems and teacher training schemes (Cyprus, Greece and Poland) to validate its strength. The paper reports the validation procedure and the psychometric properties of the TEPID scale. To this end, two research questions were examined in the study: 1. What is the factorial structure of the TEPID scale? What are the factors that make up the construct of EFL teacher perceived preparedness to include learners with dyslexia in mainstream classrooms? 2. What are the levels of measurement invariance of the TEPID scale scores across countries? Do these levels of measurement invariance justify comparisons between factor means and factor relationships across groups?

Participants
Data was collected from 832 teachers who responded to the online questionnaire powered by Survey Monkey. Respondents were approached and contacted through local EFL teachers' associations, conferences and training events, and personal networks. At the beginning of the survey, the participants were informed about the purpose of the study and told that participation was voluntary and anonymous. All the teachers consented to take part in the study. Only complete responses were analyzed. 546 participants (155 Greek-Cypriot, 233 Greek and 158 Polish teachers) answered all the questions. The average age of participants was 30 years. 52 (9.5%) of them were males and 494 (90.5%) females. 80% of the respondents were EFL in-service teachers, while 20% were pre-service teachers. 29.1% of the teachers held a BA degree, 52.7% an MA and 6.2% were PhD holders. The majority of respondents (54.2%) were experienced teachers, as they had had over 10 years of teaching experience, while 9% had no teaching experience. 66.9% of the participants reported some teaching experience with dyslexic learners. 45.2% of teachers taught regular classes in which there were students with dyslexia. 9.7% of respondents taught classes specifically designed and organized for students with dyslexia, while 12% reported conducting oneto-one lessons with dyslexic learners.

Instruments
Based on the DysTEFL-Needs Analysis Questionnaire (Nijakowska, 2014), a new questionnaire, the DysTEFL-Needs Analysis Questionnaire Revised (DysTEFL-NAQ-R), was developed to measure the pre-and in-service EFL teacher beliefs about their preparedness to include dyslexic EFL learners in mainstream classrooms (TEPID) and verify EFL teacher professional training needs on dyslexia and inclusive instructional practices. The questionnaire consists of three parts (Nijakowska, Tsagari, & Spanoudis, 2018). The first part comprises nine background questions about demographic details, level of education, general teaching experience and experience in teaching students with dyslexia. The second part contains the TEPID scale consisting of 24 items based on a 6-point Likert scale (1 = definitely not true of me, and 6 = definitely true of me). The third part includes four questions about prior training on dyslexia and inclusive instructional practices and professional training needs regarding future training, such as, for instance, preferred mode, format and content of training. The current study focuses on analyzing the psychometric properties of the TEPID scale, which is included in the Appendix, that is, the second part of the DysTEFL-NAQ-R questionnaire.

Procedure
To ensure that the TEPID instrument is reliable and valid, it was assessed by three external evaluators. The evaluators were expertise an experience in dyslexia, foreign language teaching and inclusive education. Their comments were taken into consideration when finalizing the phrasing and coverage of the items and the appropriateness of the 6-point Likert scale included in the TEPID instrument. The questionnaire was then piloted with 100 in-service and pre-service EFL teachers (20% from Poland, 40% from Greece, and 40% from Cyprus). These teachers did not participate in the main study. The pilot group had characteristics similar to the participants of the subsequent study. The analysis of the pilot results focused on checking the reliability of the individual items of the TEPID scale. Reliabilities of the items ranged from .80 to .93, which means that they were highly internally consistent (Dörnyei, 2010). The Survey Monkey software was used to administer the questionnaire. Participation in the pilot and the actual study reported here was voluntary and anonymous. The language used in the survey was English so as to avoid the challenges imposed by translating the instrument into the mother tongues of the participants, who were expected to be fluent users of English. The authors computed the index for acquiescence response style, following van Herk, Poortinga, and Verhallen (2004). Acquiescence indices were calculated as the number of clearly positive scores (2 highest categories on the rating scales) minus the number of clearly negative scores (2 lowest categories on the rating scales). Thus, from the 6-point rating scale of the TEPID, the values 1, 2, 5, and 6 were taken. The resulting number was divided by the total number of items, resulting in an acquiescence index ranging from -1.00 to 1.00. Also, the acquiescence response indices were computed separately for each item and for each country. Cronbach's α for the acquiescence response index was .88. The correlation of acquiescence indices across countries ranged from .88 to .95 (p < .001), indicating a high level of convergent validity. Thus, we can conclude that there is no systematic bias in our data.

Data screening
The data received were cleaned and missing data patterns and univariate outliers were identified. Only completed questionnaires were subject to analysis. The minimum amount of data for factor analysis was satisfied. A final size of the sample amounted to at least 155 per nationality, with over 6 cases per variable.

Factor analyses
The factorability of the 24 TEPID items was examined across the three samples. Several well-recognized criteria for the factorability of a correlation were used. 16 of 24 items correlated at least .3 with at least one other item. This indicates reasonable factorability. The Kaiser-Meyer-Olkin measure of sampling adequacy was above .92 for all samples and Bartlett's test of sphericity was significant for all groups (Cyprus: c 2 (276) = 2701.95, p < .01; Greece: c 2 (276) = 3695.92, p < .01; Poland: c 2 (276) = 2917.26, p < .01). The diagonals of the anti-image correlation matrix were all over .48, justifying the inclusion of all the items in factor analysis. The communalities were all above .3, indicating that each item shared some common variance with other items. Given these overall indicators, three separate factor analyses were performed with all 24 items for Cypriot, Greek and Polish EFL teachers.
Principal component analysis (PCA) was conducted because the primary purpose of the study was to identify and compute composite scores for the factors underlying the TEPID scale. For the Cypriot sample, the initial eigenvalues showed that the first factor explained 44.9% of the variance, the second factor 11.9% of the variance and the third factor 5.2% of the variance. The fourth factor had the eigenvalue of just over 1, explaining 4.8%. A four factor solution was examined. To this end, both varimax and oblimin rotations of the factor loading matrix were used. The three factor solution, explaining 61.9% of the variance, was chosen due to a number of reasons, the first one being its theoretical grounding. Also, the eigenvalues were "leveled off" on the scree plot after three factors. Finally, the number of primary loadings was not sufficient and the fourth factor solution proved difficult to interpret. The varimax and oblimin solutions differed only slightly, and that is why both solutions were verified in the subsequent analyses. The oblimin rotation was chosen for the final solution. The oblimin rotation provided an almost identical factor structure across the three samples and was also deemed a theoretically more reasonable solution due to the nature of the factors being studied. For the Greek sample, the same procedure was followed. The initial eigenvalues indicated that the first factor explained 43.8% of the variance, the second factor 10.8% of the variance and the third factor 4.9% of the variance. Overall, the three factor solution explained 59.5% of the variance. For the Polish sample, the initial eigenvalues showed that the first factor explained 48.5% of the variance, the second factor 11.3% of the variance and the third factor 4.9% of the variance. Overall, the three factor solution explained 64.6% of the variance. Tables 1 to 3 display the results of the analyses. All items had primary loadings over .33. Several items presented cross-loadings across the three samples, which is reasonable given the nature of the current factors. With the exception of item 9, which belongs to the third factor in the Cypriot sample but presents a cross-loading of .45 in the first factor, all other items loaded onto the same factors across all samples. Also, item 11 appears to belong to the first factor in the Cypriot sample but presents a rather low loading (.38) compared to the other loadings of the first factor. Inspecting the factor loadings across the three solutions, it appears that the factor structures of the Greek and Polish samples are more robust compared to the Cypriot sample.   Composite scores were computed for each of the three factors, based on the mean of the items which had their primary loadings on each factor. Descriptive statistics are presented in Table 4. The skewness and kurtosis were well within a tolerable range for assuming a normal distribution following examination of the histograms. This suggested that the distributions were approximately normal. Although an oblimin rotation was employed, only weak correlations between the composite scores existed ranging from .03 to .38 across the three samples. Internal consistency for each of the three scales was examined using Cronbach's alpha. The alphas (see Table 4) were very high for the first two factors across all samples ranging from .78 to .95. By contrast, the third factor showed very weak reliabilities across all samples. Overall, these analyses indicated that a two factor solution was underlying teachers' responses to the TEPID items and that two out of three factors were very highly internally consistent. We decided to discard the third factor and the related items (items 1 and 11) from further analyses due to its weak reliability. In the Cypriot sample, item 9 was included in the first factor. For all further analyses, we used 22 out of 24 items. An approximately normal distribution was evident for the composite scores estimated for the two factors; thus, the data were well suited for parametric statistical analyses.

Measurement invariance analysis
In order to investigate the factorial structure of the TEPID questionnaire and its measurement invariance across the three samples (Greek, Cypriot and Polish), a multigroup confirmatory factor analysis (CFA) was conducted using EQS 6.2 (Bentler, 2006). This analysis attempted to confirm the two-factor solution identified through PCA and test invariance of this structure across Greek, Cypriot and Polish EFL teachers.
The first step in multi-group analyses is to screen the data properly for multivariate outliers and the estimation of baseline CFA models for each sample. Preliminary analysis proved severe violations of normality among many items. For that reason, a Satorra-Bentler corrected chi-square statistic was used. It adjusts the chi-square through the inclusion of a correction factor influenced by the degree of non-normality in all sample data (Satorra & Bentler, 1994). The authors identified three similar baseline models. The three models have the same two factors: (1) knowledge and skills in implementing inclusive instructional practices with dyslexic EFL learners, and (2) stance towards the inclusion of dyslexic EFL learners in mainstream classrooms, with the same pattern of fixed and free factor loadings. However, to improve the model fit, several error covariances were specified in the models. Specifically, four error covariances were specified in the baseline model for the Greek, five error covariances for the Cypriot and three error covariances for the Polish sample. The baseline models of different groups that are integrated in the configural model should ideally be similar, although it is not necessary that they are completely identical (Byrne, Shavelson, & Muthen, 1989).
The results of the three baseline models show that the TEPID items are highly loaded onto their underlying factors in the three samples and all three models fit the data well. After the baseline model has been determined for each sample, the three baseline models were combined into a multi-group model to form a configural model. In this model, the same number of factors and the same pattern of fixed and free factor loadings were specified in each of the groups, but no equality restrictions were imposed on any measurement and structural parameter across groups. The results of the configural model are presented in Table 5, where summary fit indices are reported. Goodness-of-fit statistics related to this model reveal a well-fitting multi-group model: RMSEA = 0.08, 90% CI = (0.07, 0.09), CFI = 0.92 and TLI = 0.91. The configural model provides the baseline value against which subsequently specified restricted models are compared.
The present researchers tested measurement invariance by conducting hierarchical tests for invariance of measurement parameters. Three multi-group CFAs with varying (nested) parameter restrictions were estimated to test measurement invariance employing the ML estimator: configural model, metric invariance and scalar invariance. As shown in Table 5, comparisons of the fit indices for the configural versus metric invariance models yielded a non-significant ΔS-B χ 2 . However, comparisons of the metric versus scalar invariance models yielded a significant ΔS-B χ 2 , indicating a lack of scalar invariance.

Discussion
The aim of this study was to construct and validate a self-report survey instrument measuring EFL teachers' beliefs on their preparedness to include EFL dyslexic learners. The first research question in this study focuses on the factors that make up the construct of EFL teacher preparedness to include (properly address the needs of) learners with dyslexia (TEPID). Our findings prove a good fit for a two-factor solution with 22 items that was robust across the groups of Greek, Cypriot and Polish EFL teachers. The authors labeled the factors as follows: (1) knowledge and self-efficacy (F1), and (2) stance towards inclusion (F2). F1 comprises items referring to dyslexia-related knowledge and instruction-related teacher classroom behavior. Items regarding knowledge related to dyslexia involve familiarity with the signs and nature of dyslexia, understanding of the difficulties dyslexic individuals may experience in FL study and of effective teaching methods (like multisensory carefully structured, metacognitive techniques) (Birsh & Carreker, 2019;Kormos & Smith, 2012;Moats, 2020), awareness of the local educational policy, and accommodations in FL proficiency exams. Items pertaining to inclusive instructional practices touch upon managing classroom environment, differentiating tasks and assignments, mode of presentation, instruction, assessment and feedback techniques to properly address individual learner needs as well as ability to foster development of effective learning strategies and learner autonomy. F2 contains items concerning the importance of individualization of the teaching approach, the introduction of adjustments and accommodations, the collaboration with parents and educational professionals as well as the relationship between teacher classroom behavior and students' self-esteem and self-determination. These factors seem to reflect the constructs found in the literature and refer to the component parts of teacher preparedness for inclusion. The present findings seem consistent with previous research outcomes which highlight that the more teachers feel prepared to teach in inclusive settings, the more specialized knowledge they have to address individual learner needs, and the stronger teacher's self-efficacy is, the more confidently they apply appropriate inclusive teaching and assessment practices (Coady et al., 2016;Florian & Rouse, 2009;Hettiarachchi & Das, 2014). Previous studies showed that teachers with greater self-efficacy and positive stance towards inclusion prove to be more successful in implementing inclusive instructional practices in their classrooms (e.g., Sharma & Sokal, 2016). Being prepared for inclusion entails knowledge about the nature of dyslexia as well as the language learning processes and learning difficulties EFL learners with dyslexia may experience, which seems to be in line with the findings of the teacher content knowledge studies (Aladwani & Al Shaye, 2012;Moats, 2009Moats, , 2014Washburn et al., 2011aWashburn et al., , 2011b. Earlier studies demonstrated that teacher self-efficacy beliefs can not only regulate the way teachers respond to the demands and challenges posed by inclusive education but also influence the quality of support they provide to their students (Guo et al., 2012;Ozder, 2011). Teachers' perceptions and judgments of their capabilities prove powerful enough to impact their students' learning (Tschannen-Moran & Woolfolk Hoy, 2001).
PCA of responses to the questionnaire items across the three groups indicate that the factorial structure of the TEPID scale was almost identical across the Cypriot, Greek and Polish samples. Overall, the two-factor solution explained 56.8%, 54.6% and 59.8% of the variance for the Cypriot, Greek and Polish samples respectively. Internal consistency for each of the three scales was very high. Cronbach's alphas for the two factors across all samples ranged from .78 to .95. This reflects a compatible match and a shared similar understanding of the concept of teacher preparedness for inclusion among EFL teachers across the three samples.
The second research question addresses the levels of measurement invariance of the TEPID scale scores across countries and whether these levels of measurement invariance justify comparisons between factor means and factor relationships across groups. A multi-group CFA confirmed the two-factor solution identified through PCA and demonstrated invariance of this structure across the samples of Greek, Cypriot and Polish EFL teachers. The two-factor structure proved robust and similarly conceptualized across samples. The results of the three baseline models showed that the TEPID items loaded strongly onto their underlying factors in the three samples and that all three models fit the data well. Goodness-of-fit statistics related to a configural model revealed a well-fitting multi-group model. Three multi-group CFAs with varying (nested) parameter restrictions were estimated to test measurement invariance using the ML estimator: configural model, metric invariance and scalar invariance. When configural invariance was satisfied (the same items measured the examined construct across groups), we checked the metric (pattern) invariance by constraining the factor loadings to be equal across groups. This model verified whether the three groups responded to the items in the same way. In other words, we learned whether the power of relation between particular scale items and their underlying factors are the same across groups. Factor loadings proved to be invariant, which means that weak (metric) invariance (a prerequisite for valid between-group comparisons on two factors) was established. This in turn indicates that the respondents across the Cypriot, Greek and Polish samples attributed the same meaning to the latent factors under examination.
Stepping through the levels of invariance, the authors retained the constraints from the metric level and added further constraints -they checked for scalar invariance constraining both factor loadings and intercepts to be equal across groups. Strong (scalar) invariance, however, was not obtained. The corresponding latent factor intercepts lacked invariance across groups, which indicates that the meaning (interpretation) of the factors and the levels of the underlying items were not equal across the three groups. This can further suggest that group differences in estimated factor means can be biased. Nevertheless, the TEPID scale seems to be acceptable in studies exploring EFL teacher preparedness for including dyslexic learners. This is so because the lack of scalar invariance, unlike the lack of metric invariance, does not disqualify meaningful comparisons between groups on their scores on the two factors. However, meeting the strong (scalar) invariance level could allow for stronger conclusions relating to the between-group differences on the group means (Steinmetz, Schmidt, Tina-Booh, Wieczorek, & Schwartz, 2009). The data were well-suited for further parametric statistical analyses since participants across the three samples seemed to interpret both the individual items and the underlying latent factor in a similar way. Another reason for the appropriateness of data was the approximately normal distribution for the composite score data. These analyses can actually trigger the discussion on how different demographic variables (e.g., level of education, teaching experience, type of school teachers work at, age, gender) influence the beliefs of EFL teachers about their preparedness to include dyslexic EFL learners across three countries, that is, Greece, Cyprus and Poland (Nijakowska et al., 2018).

Conclusion
TEPID appears to be a promising assessment tool. The data gathered demonstrated that the basis of Greek, Cypriot and Polish EFL teachers' perceived preparedness for appropriate inclusive instruction and assessment of learners with dyslexia involves knowledge about dyslexia and skills (self-efficacy) in implementing inclusive instructional practices with dyslexic EFL learners, as well as teachers' stance regarding principles of inclusion.
Our findings satisfy the assumption of weak (metric) measurement equivalence of the TEPID scale scores across countries. The level of measurement invariance the authors established endorses valid comparisons between factor variances and covariances. Reliability analysis for the total scale survey, as well as factors for each country, suggested that the TEPID scale survey provides a reliable measure of EFL teacher beliefs about their preparedness to include dyslexic EFL learners in mainstream classrooms across different countries. This validates the generalizability of the TEPID scale survey across all subgroups and supports binding comparisons across these groups. The questionnaire is measurement invariant (although strong MI was not satisfied) -it measures an identical construct with the same structure in a similar way in all compared groups. Given the fact that metric invariance holds, future studies researching the occurrence, determinants and consequences of EFL teacher perceived preparedness to include dyslexic learners can use the TEPID scale and reach valid crossgroup comparisons. The authors believe that the TEPID scale possesses sufficient strength to be used for examining EFL teacher beliefs on their preparedness to include FL learners with dyslexia in mainstream classrooms, diagnosing how this perceived preparedness changes as a result of professional training, as well as designing tailor-made training schemes on inclusion incorporated into initial and in-service teacher training. The instrument also lends itself to the exploration of the impact different variables may exert on teacher beliefs on preparedness and comparing these findings across countries.
The generalizability of our results is subject to certain limitations. It should be noted that the samples were not large enough to ensure factor stability. Further research using larger samples is necessary in order to generate more precise scores. The present study provided partial support for construct validity of the TEPID. Convergent and discriminant validities using other reliable and valid measurements could not be verified. It is recommended that more detailed content and construct validities are subject to examination in future studies. Future research may also incorporate other objective or independent measures in order to supplement the subjective evaluation of the variables examined in the development of the TEPID. This, in turn, could improve the interpretation of findings. Finally, the demographic characteristics of the sample, which include 90% of females and 80% of in-service EFL teachers, constitutes a limitation. This may restrict the generalizations of the findings to female EFL in-service teachers. Research on teacher preparedness for inclusion as well as actual teacher inclusive behavior in the context of FL learning and teaching is still scant (e.g., Kormos & Nijakowska, 2017;Russak, 2016). The TEPID scale survey can prove useful in more systematic investigations of EFL teacher preparedness for inclusion and the role inclusive teacher training plays in increasing teacher perceived preparedness -knowledge and self-efficacy, fostering positive attitudes and also alleviating concerns about implementing inclusive instructional practices. Also, investigating how EFL teacher preparedness for inclusion translates into student achievement and motivation to learn seems necessary to draw a more complete picture of inclusion in instructed FL environments. In addition to teacher perception studies, research on actual inclusive practices that EFL teachers employ to individualize and differentiate their approach in order to accommodate learners' needs can generate important findings and implications for practice.