The gender factor in the perception of English segments by non-native speakers

The aim of the paper is to present the findings of an empirical study which contributes to the ongoing research into gender effects on second language acquisition by exploring a biological influence on L2 pronunciation learning. One of the most frequent arguments used to vindicate single-sex education is that there are substantial sensory and perceptual differences between males and females which rationalize gender-specific teaching methods and gendersegregation at schools. The present study provides some preliminary insights into the perception of selected phonetic contrasts by Polish secondary school learners with the aim of investigating gender-based similarities and differences in the accuracy of sound recognition by males and females. The findings suggest that a commonly cited female advantage in acquiring L2 pronunciation cannot be attributed to their superior phonetic perception, as male participants performed equally well and identified the same number of English segments correctly.


Introduction
The gender 1 factor in foreign language learning (hence FLL) has been explored extensively over the past few decades. Previous research has discussed series of neurological, cognitive, and entwined individual differences, such as language aptitude, learning strategies (Gass & Varonis, 1986;Green & Oxford, 1995;Oxford & Nyikos, 1989), personality (Nyikos, 1990), attitude (Field, 2000), motivation (Dörnyei et al., 2006;Mori & Gobel, 2006;Sung & Padilla, 1998;Williams et al., 2002) and long-term attainment (Główka, 2014;Michońska-Stadnik, 2004;Murphy, 2010), which may operate differently for men and women. One of the most relevant findings is the acknowledgement of the interaction of gender with other social factors, such as ethnicity, identity and social class (Ellis, 1994), which, in their turn, have an impact on the above personal factors relevant for FLL. Hansen Edwards (2008, p. 255) concludes that when gender is framed and investigated as a social and not merely biological construct, "it does appear to impact the level of access learners have to L2 use opportunities and therefore the ability to get L2 input and negotiate meaning, which appear to affect L2 development." Such realizations have triggered a shift from biological essentialism to social constructivism, for example differences in foreign language performance are no longer attributed only to inherent and fixed biological qualities of the two sexes, but they are viewed as shaped by gendered social activities and culture-specific language ideologies (Ehrlich, 1997;Pavlenko & Piller, 2008).
While the socio-constructivist approach provides a more in-depth understanding and explanation of individual variation in language achievement even within the same gender group depending on the context, the essentialist approach should not be abandoned, as many educators still refer to "brain-based" learning to emphasize the brain and hormonal (testosterone and estrogen in particular) effects on cognitive development (Crossland, 2008;Liben, 2015). One of the most frequent arguments to vindicate single-sex education is that there are substantial sensory and perceptual differences between males and females which rationalize gender-specific teaching methods and gender-segregation at schools (Holthouse, 2010).
Irrespective of a large body of research, and despite the fact that ample evidence points to a female advantage in mastering pronunciation (lower degree of foreign accent reported in Diaz-Campos, 2004;Moyer, 2010, as cited in Moyer, 2016Tahta et al., 1981;Thompson, 1991; better oral fluency in a study on Chinese immigrants by Jiang et al., 2009), "gender has received scant attention in L2 phonology studies" (Moyer, 2016, p. 8). Therefore, it is still not fully understood to what extent the existing differences between the two sexes should be attributed to anatomical and functional properties of the brain or whether they pertain to socially conditioned factors (or both). The present empirical study contributes to the ongoing research into gender effects on second language acquisition by exploring a biological influence (including differences in the auditory system and phonological processing) on L2 pronunciation learning. It provides some preliminary insights into the perception of selected phonetic contrasts by Polish secondary school learners with the aim of investigating gender-based similarities and differences in the accuracy of sound recognition by males and females.
The rationale for undertaking the study is twofold. Firstly, there is a body of evidence that men and women differ with respect to the properties of their auditory systems and phonological processing (Krizman et al., 2012;McFadden, 1998). On the other hand, it has been noted that previous research has been carried out mainly with adults and newborns so there is a need to compare boys' and girls' auditory perception in their natural environment, such as the classroom (Eliot, 2011). Secondly, perception has been found crucial for adequate production in many studies on FLL (Almbark, 2014;Bent 2005;Flege 1991Flege , 1995. Even though the issue of whether perception precedes production or vice versa is far from settled, the role of perception in pronunciation teaching should not be underestimated and it has often been claimed that the two speech modalities complement each other and should be treated in an integrated way (Couper, 2011;Escudero, 2009). Despite the fact that many researchers have explored the perception of L2 sounds by foreign language learners, including Poles (Balas, 2018;Bogacka, 2004;Nowacka, 2008;Rojczyk, 2010), gender has not been addressed as a variable. Providing data on this issue could, therefore, add to the discussion of whether single-sex education is justified in terms of foreign language pronunciation learning.

The gender factor in language processing
Neurolinguistic research points to anatomical and functional differences between male and female brains (Ruigrok et al., 2014;Sacher et al., 2013;Stevens & Hamann, 2012). The two cerebral regions, most strongly associated with language processing, Broca's and Wernicke's areas, are 20% and 30% larger in women than in men, respectively (Kurth et al., 2017). The human auditory system is also characterized by physiological and psychophysical differences between the two sexes (de Lima Xavier et al., 2019). On the whole, heads, external ear canals and middle ear volumes are relatively larger in men than in women (Bowman et al., 2000;Cahill, 2014). Females exhibit better hearing sensitivity and have greater susceptibility to noise, while males excel at sound localization and signal detection in complex masking tasks (McFadden, 1998;McFadden et al., 2009). Some studies report that sex-based differences in processing auditory cues during speech production and the prevalence of speech and developmental phonological disorders in the male population (Keating et al., 2001;Corazzi et al., 2020) may be associated with structural brain differences between the two sexes, that is greater cortical thickness in left Heschl's gyrus in females than males (de Lima Xavier et al., 2019). Also, men display greater hearing loss related to age (Pedersen et al., 1989).
Using non-invasive methods such as functional magnetic resonance imagery (fMRI), it is possible to trace real-time lateralization patterns in speaking and listening tasks. There is neurological and neuropsychological evidence that women rely on both hemispheres in language processing to a greater extent than males do (de Lima Xavier et al., 2019). Left hemisphere dominance of language functions in men has been confirmed in behavioral observations that speech pathologies (stuttering), dyslexia and severe aphasia symptoms are more frequent among males with left hemisphere damage (48% in males vs. 13% in females) (Clements et al., 2006;Reber & Tranel, 2017;Voyer & Voyer, 2015). Less lateralization for language functions in women could (at least in part) explain why they outperform members of the opposite sex in verbal skills (Christova et al., 2008;Kansaku et al., 2000;Obleser et al., 2001;Ruigrok et al., 2014). Burman et al. (2008) showed sex differences in children (aged 9-15) when performing two linguistic tasks presented in two sensory modalities (visual and auditory). The female participants were found to rely on a broader neural network during processing tasks, while males had a less efficient (sensory rather than abstract) processing mode. Shaywitz et al. (1995) also found sex differences in the functional brain organization for language in a study that used echo-planar functional magnetic resonance imaging across three different tasks, that is, orthographic, semantic and phonological (rhyme). During phonological processing, males exhibited leftlateralized brain activity, while in females more widely-distributed neural systems were engaged. The same finding of cerebral laterality emerged in a study which focused on both language (a timed rhyme-matching task) and visuo-spatial processing (Clements et al., 2006). The evidence pointed to lateralization differences between the two sexes, that is men were more left lateralized when performing the phonological task, but bilateral in the visuo-spatial task, while female brains showed activity in both hemispheres for phonological processing and were more right lateralized for the visuo-spatial task.
The gender differences in the phonological processing capabilities of the right and left hemispheres evident in the abovementioned neuroimaging data were put to the test and corroborated in behavioral studies (i.e., using a visual half-field paradigm, a behavioral technique for drawing inferences about hemispheric activity). Lindell and Lum (2008) conducted two experiments in which phonological activation was measured implicitly (masked homophone priming) and explicitly (rhyme judgements). The pairs of words used in the two experiments differed with respect to spelling pronunciation overlap, that is, they shared 1) high orthographic and phonological similarity (e.g., knot ~not), 2) high orthographic and low phonological similarity (e.g., pint ~hint), 3) low orthographic and high phonological similarity (e.g., use ~ewes) and 4) low orthographic and low phonological similarity (e.g., kind ~done). The participants were asked to make decisions (word/nonword and rhyme/nonrhyme) with reference to the target words displayed after primes in the right and left visual field of the screen. Women had consistently faster response latencies than men and turned out to be less sensitive to visual field stimulation during both implicit and explicit phonological tasks, which implies bilateral phonological processing. Males showed left hemisphere dominance especially when asked to differentiate between homophones with low orthographic similarity. The study provided behavioral evidence indicating that, despite the fact that both hemispheres possess the capacity for orthographic analysis, phonological processing is left-lateralized in males but has a more diffuse representation in females (Lindell & Lum, 2008). This finding is consistent with an earlier behavioral study (Coney, 2002) which used rhyme-matching and pseudohomophone tasks. In both, women were unaffected by a lateral presentation of the stimuli, and their phonological processing was more efficient than that of males who relied only on the left hemisphere to activate phonology.
Since female brains process language bilaterally and because the right hemisphere is considered a primary location for prosodic and emotional decoding (Pell & Baum, 1997), it has been speculated that women have an advantage over men in interpreting prosody, specifically paralinguistic cues which convey emotions (Besson et al., 2002). In a study by Schirmer et al. (2002), women outperformed men in identifying emotional prosody, and when processing words they relied on emotional-prosodic contexts more automatically than men. Similarly, men showed longer reaction times in a later study by Imaizumi et al. (2004). The results suggested that "emotional prosody modulates perceptual word processing" (2004, p. 122) and such modulations may vary between the two genders with women making earlier usage of emotional prosody and men needing more time to integrate linguistic semantics and emotional information. Since the activation of the right frontomedian cortex was significantly stronger in the male than in the female participants, it was concluded that men needed to make more conscious inferences about the emotional intent of the speaker. Rymarczyk and Grabowska (2007) were also interested in whether the ability to understand prosody is influenced by the listener's gender. They focused on two types of prosodic utterances, that is, affective/emotional and non-affective/linguistic. The results demonstrated that different parts of the right hemisphere were involved depending on different emotional intonations (as opposed to non-affective ones) and the brain organization of prosodic functions varied depending on the listeners' sex.
The anatomical and functional brain differences reported above are usually presented as "hard wiring" (Chadwell, 2010, p. 8) and have stimulated a heated and highly politicized debate on sex segregation in education, because it has been argued that "built-in gender differences in hearing have real consequences" (Sax, 2005, p. 17) for, among other things, classroom strategies. Studies by Holthouse (2010) and Hughes (2006) confirmed that males prefer direct and competitive teaching methods (with more space and movement that enhance the processing of information), while females obtain better results under a more cooperative and nurturing approach. Advocates of single-sex education argue that mixed environments can inhibit females' academic performance and achievement in maths and science (Sax et al., 2009). Many authors believe that same-sex schools promote gender equity as well as encourage students to expand their horizons and pursue interests without being constrained by gender stereotypes (James, 2009;Salomone, 2003). Riordan (2015) reports that recently there has been a resurgence of interest in single-sex schools in modern societies worldwide, both in the public and private sectors, and argues that they are one of the ways of raising the effectiveness of schools.
It must be stressed that even though the literature on gender differences in language processing is extensive, there is no consensus as to whether an actual difference exists or not. According to some researchers, sex-based differences are significant only during childhood and manifest themselves in girls' earlier language acquisition, as well as their better performance on language tests during early education (Bauer et al., 2002;Lange et al., 2016), but fade away with age (Beltz et al., 2013). On the other hand, there are claims that for certain skills, including writing (Coley, 2001) and verbal performance, male-female differences are negligible at an early age and only appear during adolescence and early adulthood. It is also important to note that recent meta-analytical reviews have shown that gender differences in verbal abilities cannot be substantiated by evidence and have pointed to "no consistent differences between males and females in language-related cortical regions" (Wallentin, 2009, p. 175). A metaanalysis of 24 studies measuring brain lateralization with functional imaging in a large sample of 377 men and 442 women found no significant difference between the two sexes with reference to language functions. This suggests that language lateralization is unlikely to account for sex differences in cognitive performance (Sommer et al., 2004). Nevertheless, one of the most recent studies encompassing 109 participants and using high resolution MRI and speech-production fMRI provided evidence for the existence of sex-specific structural dimorphisms within the speech production circuitry (de Lima Xavier et al., 2019). As a result, sex emerged as an important biological variable to be considered in research on neural correlates of speech control.
To conclude, females may have a learning benefit as a result of hemispheric integration during language processing and more effective memory strategies relevant to phonology (Moyer, 2016). As a result, they may be better equipped to notice fine phonetic and prosodic distinctions, "thereby benefitting perception and/or production of new sounds, and by extension, long term acquisition" (Moyer, 2016, p. 23). This statement needs to be tested empirically to vindicate the claim that the oft-reported anatomical and functional differences in male and female brains and auditory systems result in a different perception of non-native sounds.

L2 sound perception
Because of its relevance to the successful acquisition of an L2 sound system, the perception of non-native sounds by language learners has been studied within a number of theoretical models. They were developed to explain the patterns whereby L2 speakers perceive and categorize phonological contrasts in a foreign language as well as to account for the difficulties inherent in establishing new phonetic categories.
In Flege's (1995) speech learning model (SLM), perception is defined as the discrimination of phonetic features in the signal in order to identify phonetic categories stored in long-term memory. The systems of L1 and L2 share a common "phonological space" (1995, p. 242) and already-established L1 phonetic categories act as a point of reference for L2 sounds. The more similar an L2 sound is to that in L1, the higher the probability that "equivalence classification" will prevent the formation of a new category (Flege, 2002, p. 224). In such cases a single phonetic category will process perceptually linked L1 and L2 sounds, and merged categories (socalled diaphones), containing input from both languages, will be formed. The perceptual assimilation model (PAM; Best, 1994), originally based on naïve non-native listeners, was later extended to include foreign language learners (Best & Tyler, 2007). PAM-L2 postulates that non-native phonetic segments are perceptually assimilated to the most articulatorily similar native phonemes, and, rather than relying on mental representations, the listener uses articulatory cues to identify sounds. Different assimilation patterns can be detected, that is, single category (SC), category goodness (CG), two category (TC) and uncategorized-categorized (UC) assimilation. SC assimilation takes place when two L2 phones are equally good or poor instances of the same native phoneme and their discrimination and identification is therefore deficient. The situation in which two non-native phones are mapped to the same L1 phonological category with one of them being considered as phonetically more suitable than the other is referred to as CG assimilation and means good discrimination. For the TC and UC assimilation patterns discrimination is predicted to be very good. In the former case two non-native phones are assimilated to two distinct L1 phones and in the latter only one L2 phone is ascribed to an already existing L1 category.
According to the more recent linguistic perception model (LP), speech perception is governed by cognitive linguistic (phonological) knowledge (Escudero, 2005), which comprises a linguistic and grammatical processor and perceptual representations. In its L2 version (L2LP), the "optimal perception hypothesis" is formulated, according to which the optimal listener will rely on auditory cues that distinctly differentiate the sounds in the production of the L2. It is assumed that learners can form new categories along non-previously categorized dimensions using distributional learning the way it is done in the L1. L2 learners automatically create a copy of the L1 perception grammar and the L1 representations. From the very outset, L2 phones are processed by a separate perceptual system which not only leaves that of the L1 intact, but also evolves with experience.
Two hypotheses which deserve to be mentioned refer to the acquisition and perception of features not present in the L1. According to the feature hypothesis (FH;McAllister et al., 2002), L2 features not employed in the L1 to signal phonological contrasts will be difficult for L2 users to perceive and, consequently, to produce. Contrary to FH, Bohn's (1995) Desensitization Hypothesis emphasizes the importance of language-independent auditory-based strategies in L2 perception, concluding that "duration cues in vowel perception are easy to access whether or not listeners have had experience with them [in their native language]" (Bohn, 1995, p. 294). There is empirical evidence that, instead of relying on spectral properties like native speakers do, L2 learners use and exaggerate durational cues to differentiate between English /I/ and /i:/. This has been reported for Mandarin, Japanese, Spanish, Portuguese, Catalan, Russian and Polish, speakers even though none of these languages uses vowel duration contrastively .
Several studies have documented that L2 learners need to perceive the differences between L2 sounds and similar L2 or/and L1 sounds before being able to produce them accurately (Flege, 1995;Underbakke, 1993). Grasseger (1991) reported that learners with properly established perceptual categories were also able to articulate L2 sounds properly, suggesting that accurate perception is a pre-requisite for accurate production and that perceptual tests can predict problematic areas in L2 production. Moreover, there is empirical evidence that perceptual identification training leads to an improvement in both speech perception and production (Bradlow et al., 1999;Rochet, 1995;Wang et al., 2003). Nevertheless, Derwing and Munro (2015) point out that perception does not automatically lead to production; for example, English speakers easily perceive the difference between Spanish "trill" and "tap," but many never succeed at producing the distinction.
Importantly, studies often demonstrate the opposite, that is, that production may precede perception (Kosky & Boothroyd, 2003;Parsloe, 1998) and also that some learners can produce contrasts between L2 sounds without being able to perceive them accurately; for example, Japanese learners were able to maintain a contrast between English /r/ and /l/ in articulation, but failed to distinguish the two phonemes auditorily (Smith, 2001). Such conflicting results indicate that the relationship between perception and production is complex and a number of individual variables (learners' age at the point when L2 learning begins, the length of L2 experience, and, notably, listeners' perception of their own speech) may underlie the development of L2 perception and production (Baker & Trofimovich, 2006). Even though the link between the two modalities has been a matter of dispute, the mechanism of speech perception and production seems to form an integrated system and empirical evidence suggests that practice in one speech dimension benefits the other (Gómez Lacabex et al., 2005;Leather & James, 1991;McAllister, 1997).

Aims and research questions
The aim of the present experiment is to study the perception of English sounds in its Standard Southern British English (hence SSBE) variety by Polish learners of English in order to investigate whether there exist statistically significant differences between females and males with respect to identifying minimal pairs. Because rejecting the null hypothesis is not enough to make more specific judgements about practical significance of the findings (Plonsky & Oswald, 2014), effect sizes will be calculated and interpreted.
I set out to compile a hierarchy of perceptual difficulty for each gender group and compare the accuracy of the identification of selected phonetic contrasts. By doing so, it will be possible to establish whether the gender-based differences in hearing and phonological processing reported in the literature result in actual differences in performance (better perception that can lead to better production), giving an advantage to either males or females. The following research questions are addressed: 1. Is there a difference in the perception of segmental contrasts depending on the listeners' gender? 2. Is the hierarchy of perceptual difficulty similar for the two gender groups?
Regarding the first research question, it is hypothesized that, if there is a difference, female participants will identify properly more segmental contrasts than males (H1). As refers to the second question, the hierarchy of perceptual difficulty is expected to be arranged similarly for the two gender groups and reflect the interplay between their L1 and L2 (H2). It is predicted that the more English vowels are mapped to a given Polish vowel, the less adequate their identification, for example, /{/, /V/ and /A:/ are mapped to Polish /a/ and, therefore, their identification is more problematic than that of, for example, /e/ and /I/ assimilated to Polish /E/ and /i/, respectively. Also, contrasts which have a higher functional load and receive more focus during phonetic instruction (e.g., /I/~/i:/, /T/~/t/, /T/~/s/, /z/~/D/) are easier to identify than those of lesser frequency and typically assimilated to one vowel (/Q/~/O:/, /U/~/u:/). It needs to be stressed from the outset that our main aim is to compare the accuracy of performance across the two gender groups and not to study the perception of particular contrasts in different phonetic contexts and draw definite conclusions as to which minimal pairs pose difficulty for Polish learners and which do not.

Participants
Eighty Polish listeners (40 female and 40 male) participated in the experiment. They were secondary school students, aged 16-18 (Mo = 17) and all right-handed. Their participation was voluntary and anonymous; they were all drawn from the same school and shared the same English teacher, who administered the questionnaires during one of the lessons. None of them reported any prior history of hearing or speech impairment. The participants were requested to provide some personal data in a brief questionnaire, whereby it is known that at the moment of the study their average learning experience of English ranged from 8 to 14 years (Mo = 10.2, SD = 1.1 for females, and Mo = 10.5, SD = 1.4 for males). There were no statistically significant differences between the two groups regarding this variable: z = 0.8, p = .4. A large majority of the informants (68%) had never stayed in an English speaking country, the rest (32%) reported short (up to 3 weeks) visits therein. This experience with English abroad did not introduce statistically significant differences between the tested groups (Chi 2 = 0.952, p = .329). It can, therefore, be concluded that the participants constituted a fairly homogenous group with respect to biographical factors (years of learning and L2 exposure in an English speaking environment).

Procedure and stimuli
The participants were administered paper-and-pencil questionnaires (see Appendix) and listened to 20 unrelated sentences containing minimal pairs and were asked to circle/underline the word they heard. The context and semantics allowed for either of the two in each stimulus, as in, Have you seen my cap/cup?, which means that the facilitative potential of the context was played down to the maximum (the context makes sense without being too predictable; the proper decoding of the token is essential for adequate comprehension). Each sentence was reproduced twice and was preceded with a question referring to a concrete minimal pair, thus making it explicit to the listeners which part of the sentence they should focus on in each case, that is, cap or cup. All the examples and the audio material were taken from coursebook "English Pronunciation in Use Intermediate" by Hancock (2003). They were selected from various units and merged into a list of 20 sentences which contained the phonetic contrasts most problematic to Polish learners of English. The preparation of the speech sample was made with the use of an online sound editing tool mp3 Cutter Joiner.
The motivation behind using the coursebook material was to give the respondents a typical task they might be presented with during ear-training sessions in the classroom, as exercises targeting auditory memory are provided in most pronunciation coursebooks. It should be noted that, even though the respondents were familiar with the task of minimal pair identification itself, they had not worked with Hancock's coursebook in the classroom, and, thus, the stimuli used in the experiment were completely novel for them.
Furthermore, extracting the diagnostic material from one source meant the features of the spoken text remained constant. This refers to the variety used, the speaker's voice characteristics (male), and the type of the text itself (read out). Also, the lexical frequency of the items was, on the whole, controlled, namely all the tokens (with the exception of Robin and robbing) were high frequency words, according to the Longman Vocabulary Checker (http://global.long mandictionaries.com/vocabulary_checker). SSBE was used as a point of reference, because it is one of the two main varieties taught in Poland and a lot of pedagogical materials used in Polish secondary schools keep it as a model. My own analysis of audio materials used for listening comprehension tasks at the national exam for secondary school leavers revealed that the main variety used there was SSBE (Bryła-Cruz, 2017).
Before I present the stimuli used in the study, it is worthwhile comparing the sound systems of Polish and SSBE to justify the choice of the tokens by indicating the areas of phonetic interference. The two languages do not have the same number of consonants (25 in English compared to 29 in Polish). In addition to 11 that are identical in both languages, in a few consonants the place of articulation differs, namely /t, d, s, z, n/ are alveolar in English but (post)dental in Polish, /h/ is glottal in English, but velar in Polish, and the counterparts of the palato-alveolar English /S/, /Z/, /tS/, /dZ/ are post-alveolar in Polish (/´, À, t´, dÀ/). The approximant /r/ is articulated as a post-alveolar frictionless continuant in English, while in Polish it is a post-alveolar trill.
Additionally, some phonemes of the English language are not found in Polish. The improper rendition of the interdental fricatives /T, D/ (as /t, f, s/ and /d, v, z/, respectively) constitutes one of the most persistent and the most difficult to eradicate errors made by Polish learners. The velar nasal is found in Polish as an allophone of /n/ before /k, g/ and not as a separate phoneme. As a result, its distribution remains problematic and Polish speakers of English are able to pronounce /N/ in isolation or in a familiar context, but have difficulty suppressing the velar stop in word-final or pre-vocalic positions.
The vocalic inventories of Polish and English differ to the extent that not a single vowel is identical in the two languages. Polish has only 6 vowels /a, E, i, 1, O, u/ which is a very limited set compared to the 12 monophthongs /i:, I, e, {, V, @, 3:, Q, O:, U, u:, A:/ and 8 diphthongs /OI, aI, eI, @U, aU, I@, U@, e@/ in English. English vowels are divided into short and long and differ in duration depending on the context. Vowel duration is a cue to the voicing of the following consonant and is correlated with word-stress. Polish vowels are characterized by durational invariability, meaning that inherent and relative vowel length, as well as vowel reduction are very difficult for Polish learners to acquire.
In general, since Polish has fewer vowels than English, Poles typically employ one vowel as a substitute for two, three or even four target vowels (e.g., /u/ for /u:/ and /U/; /a/ for /V/, /A:/ and /{/; /E/ for /e/, /{/, /@/ and /3:/), thereby neutralizing a number of contrasts. Moreover, one English vowel can be mapped to two Polish vowels, for example, /{/ is substituted with /a/ or /E/ and the choice of either depends on two main factors, namely general proficiency of English and the place of articulation of the following consonant. Additionally, Polish has no diphthongs so learners of English replace them either with a vowel and a glide, or with a vowel followed by a glide and a vowel. Some diphthongs may also be realized as monophthongs, for example, /@U/ is substituted with /O/, for example, Poland */pOlant/, and /e@/ with /e/, for example, hair */he/ or /her/. Table 1 presents the list of phonetic contrasts under investigation, the minimal pairs used as tokens and the carrier sentences in which they were embedded. The last column specifies which of the two stimulus words was the target item. As can be observed, the stimuli encompassed 13 vocalic contrasts  Two criteria were adopted while selecting the stimuli. Firstly, a number of comparative analyses between Polish and English phonetics (e.g., Gonet et al., 2013aGonet et al., , 2013bSobkowiak, 1996) were taken into account. They all provide extensive descriptions of the most recurrent mispronunciations made by Polish speakers of English. Secondly, previous empirical research provided insight as to which contrasts should be incorporated in the present study because their deviant renditions might impair intelligibility in communication with English native speakers and contribute significantly to the impression of a foreign accent in English as well as may evoke English native speakers' irritation (Bryła-Cruz, 2016;van den Doel, 2006). As a result, all the phonetic features included in the experiment should be pedagogically prioritized.

Data analysis
The obtained results were submitted to a statistical analysis. The Chi-Square test was used to check the correlation between particular variables and the number of correctly marked answers; Yates' correction was applied when at least one of the expected numerical values was smaller than 10. The normality of data was established by means of the Shapiro-Wilk test. In order to determine differences between the two gender groups, the Student's t-test was performed, and, where necessary, its non-parametric equivalent, the Mann-Whitney U test, was used. The threshold of statistical significance for all differences and correlations was set at p < .05. Effect sizes, measuring the magnitude of the observed relationships and their practical significance, were calculated using a Phi coefficient for t-test and Cohen's d for Mann-Whitney U test. Table 2 presents the descriptive statistics for the two gender groups with respect to the number of correctly identified items. The calculations were performed with the exclusion of the pair Robin-robbing because, as mentioned earlier, this minimal pair was different from the rest (did not belong to high frequency words). The mean number of correctly identified words overlaps in the two groups (13) and the same holds true for the range of accurate answers which are roughly the same, that is 8-18 and 9-17 for males and females, respectively. The Mann-Whitney U-test revealed that the observed slight inter-group differences were not statistically significant (z = 0.112, p = .911, p > .05).  Table 3 juxtaposes the hierarchy of perceptual difficulty for each gender group. The first and the second columns contain the contrasts and the targets, respectively, and in the third column the percentage of correct scores is included (incorrect answers and instances of no answer were merged into one group). The data enable us to spot similarities and differences between the two gender groups. Both males and females were most successful in recognizing the contrast between /I/ and /e/ (100% of accurate answers) as well as /T/~/s/ and /{/~/e/ (over 90% of correct answers). More than half (60-87.5%) of men and women identified the following contrasts accurately: /i:/~/I/ (also before the dark /5/), /Q/~/@U/, /o:/~/Q/, /3:/~/e/, /{/~/A:/, /n/~/N/ (word-medially), /z/~/D/ and /T/~/t/. The sounds that posed most difficulty (less than 60% of accurate answers) to the listeners were /n/~/N/ (word-finally), /s/~/z/ (word-final fortis / lenis distinction), /V/~/A:/, /u:/~/U/ and /{/~/V/. The lowest score of all was 20% for women and 37.5% for men. Table 3 The hierarchy of perceptual difficulty for male and female respondents   Males  Females  Contrast  Target  Accuracy Contrast  Target  Accuracy  /I Apart from the above similarities, it can be seen that the contrast /V/~/Q/ was identified properly by 47% of males as compared to 67% of females. However, this difference was not statistically significant (Chi 2 = 3.274, p = .07), neither was it of any practical significance (Phi = 0.18). The two contrasts which occupied a distinctively different place in the above hierarchy depending on the listeners' gender were /@/~2 and /3:/~/e@/. The former was identified properly by 60% of women and 87% of men, while the latter by 92.5% of females as opposed to 60% of males. These differences were statistically significant (Chi 2 = 9.028, p = .003 and Chi 2 Yates = 8.674, p = .003, respectively). The effect size was similar in both cases, namely Phi = 0.33 and Phi = 0.36, and it was moderate. As a result, it can be concluded that the observed differences were rather small. Figure 1 and Figure 2 depict a detailed distribution of correct and incorrect answers for each gender group. As can be seen in Figure 1, the contrast /@/~2 turned out to be easier to identify for males, who had 87.5% of accurate answers (compared to 60% provided by females). Figure 2 shows that females were better at recognizing the contrast between /3:/~/e@/ and gave 92.5% of correct answers (the success rate for males was 60%).

Discussion
As regards the first research question, males and females performed equally well in the recognition task. As a result, our H1 about women's advantage in the accuracy of segmental identification has not been confirmed. Statistically significant differences in perception were traced only for two contrasts (/3:/~/e@/ and /@/~2), with women outperforming men in the former case and men outperforming women in the latter. While it is rather hard to provide an explanation as to why these two contrasts were perceived differently by the two groups, it needs to be noted that these statistically significant differences entail small practical significance. The outcome of our study does not suggest that oft-cited female advantage in learning L2 pronunciation (Moyer, 2016) can be ascribed to their superior phonological perception. With reference to the second research question, the hierarchy of perceptual difficulty is arranged similarly for the two gender groups, whereby our H2 is supported. The interplay between the listeners' L1 and L2 largely determined the accuracy of perception.
As far as consonants are concerned, the best results were obtained for /T/~/s/, /T/~/t/ and /z/~/D/, which was also the case in an extensive study conducted among Polish university students by Nowacka (2008). This could be explained by the emphasis that the dental fricatives are given in phonetic instruction in the classroom. The contrast between /s/~/z/ proved difficult in recognition. It must be stressed that this opposition appeared word-finally, so the main difference between the two sounds lay in the force of articulation which modified the relative length of the diphthong. As shown empirically by Rojczyk (2010), Polish learners distinguish between voiced and voiceless obstruents on the basis of voicing and do not pay attention to vowel duration as a temporal parameter cueing the voicing contrast. Moreover, it was found that in English word-final fricatives the amount of voicing depends on the perceptual strength of the sound (Gonet, 2010). Therefore, the most strident /z/ is voiced only through 50% of its duration as compared to the less conspicuous interdental or labio-dental fricative (80% and 60% of voicing, respectively). In the present study prize was understood as price by 47% of the listeners confirming that the issue of partial voicing word-finally is problematic for Polish learners, whose native language uses either voiced or voiceless obstruents (with only the voiceless permitted word-finally).
With respect to the identification of the vowel contrasts, the findings echo to some extent those of Balas (2018) and Bogacka (2004) since the main factor influencing the perception of vowels was the interplay between the Polish and English vocalic inventories. The adequate identification of the target vowels was negatively correlated with the number of corresponding vowels between the two languages meaning that the greater the number, the less accurate the perception. Therefore, the least accurate recognition occurred for vowels usually mapped to Polish /E/ and /a/, such as /e, {, 3:/ and /{, V, A:/, respectively. The lowest position in the hierarchy is occupied by /{/~/V/. Because of their perceptual similarity, these two vowels are mapped to one Polish vowel /a/, resulting in poor discrimination, which is in line with Flege's (2002) "equivalence classification" and Best's (1994) SC assimilation pattern. The two vowels are perceived as too similar for distinct categories to be formed. Similarly, for /u:/ and /u/, both of which are mapped to Polish /u/, identification turned out to be poor. On the basis of her empirical study, Bogacka (2004) concludes that Polish learners are unlikely to distinguish between /u:/~/U/ and when they need to, they rely heavily on temporal cues neglecting spectral properties, meaning that, for Polish learners, the primary difference between this pair is duration and not the position of the tongue or lip rounding. In the study by Balas (2018), these two contrasts (/{/~/V/ and /u:/~/U/) were also poorly discriminated and judged to be the most similar. The easiest in terms of perception were the following four pairs: /I/~/e/, /{/~/e/, /Q/~/@U/ and /i:/~/I/. For both male and female participants, the identification of the /I/~/e/ contrast proved the least problematic (100% of correct answers) and was followed by /{/~/e/ (90% and 92% for males and females, respectively). These two contrasts proved the easiest to recognize most probably because they follow the TC assimilation pattern; each member of the pair is assimilated to two distinct categories in the learners' native inventory for /I/~/e/ it is /i/ and /E/ and for /{/~/e/ it is /a/ and /E/, which predicts discrimination to be excellent.
Moreover, the tongue position turned out to influence the perception, so that front vowels were slightly easier to recognize than back/central vowels. Also, the functional load may have contributed to the fact that the accuracy of recognition was better for /i:/~/I/ than for either /u:/~/U/ or /o:/~/Q/. During English instruction, more emphasis is placed on the contrast between the front pair than between the back pairs and this is dictated by lexical frequency. A good identification of /i:/~/I/ shows that Polish learners are able to distinguish well between two phonemes not found in their native inventory and that they can become sensitive to features not used in their L1 (duration). This is in line with Bohn's Desensitization Hypothesis and LPM.
It can be observed that the contrast between gun~gone was more perceptually salient than in cup~cap which is in line with PAM's predictions. Since /V/ and /Q/ are typically assimilated to two separate Polish vowels (/a/ and /O/, respectively), the identification of the contrast was less problematic than in the case of /{/~/V/, which are perceptually more similar and often substituted with Polish /a/. Even though this is also the case with /V/~/Aː/ (both tend to be ascribed to one Polish vowel), this pair had a better recognition rate as, in addition to height and advancement, the vowels differed in duration and tenseness. Since, as corroborated by Bogacka (2004), Polish learners are more sensitive to temporal characteristics than spectral ones, the identification of /{/~/V/ was poorer than that of /V/~/Aː/, which implies that tongue advancement turned out to be an insufficiently conspicuous feature to distinguish the two vowels. This behavior is congruent with Bohn's Desensitization Hypothesis, according to which "whenever spectral properties are insufficient to differentiate vowel contrast because previous experience did not sensitize listeners to those spectral differences, duration differences will be used to differentiate the non-native contrast" (Bohn, 1995, p. 294). In other words, duration is psychoacoustically salient for L2 learners and they are able to create a contrasting mechanism along this dimension, even if duration is not used contrastively in their L1 (Escudero, 2005;Escudero & Boersma, 2004).
The present study had several limitations that need to be considered when interpreting the results and designing similar experiments in the future. Firstly, only two vowels were tested in different contexts (/V/ before a fortis plosive and before two nasals; /i:/ before a lenis fricative and the dark /5/). Because the identification of sounds can be influenced by the phonological context, future studies should ensure more diversification in this respect. Secondly, the scope of the present study was restricted to segmental aspects, but L2 prosodic perception in the classroom environment deserves further inquiry as well. Additionally, the diagnostic material was based on SSBE and it would be interesting to compare the results for General American. Needless to say, the conclusions drawn here are tentative and more research is needed to verify my claims and shed more light onto the gender factor in foreign language perception.

Conclusion
The paper reported on an empirical investigation into the perception of English segmentals by 80 Polish secondary school learners. The goal of the study was to establish whether there exist gender-based differences in identifying selected vocalic and consonantal contrasts particularly problematic for Polish learners, and, thus, prioritized pedagogically. The existence of such perceptual dissimilarities could have implications for pronunciation instruction since biological endowment, including differences in the auditory system and phonological processing, has often been cited by advocates of single-sex education.
In light of our data, it cannot be claimed that males and females differed considerably with respect to the perception of L2 segmental contrasts, because neither group performed the identification task better than the other. The results pointed to no statistically significant differences with the exception of the following two cases: /3:/~/e@/ and /@/~/2/. Yet, the magnitude of the observed relationship between the listeners' gender and their perception was small and thus had little practical significance. The hierarchy of perceptual difficulty was not identical for the two gender groups, but it was marked with numerous similarities. For example, the easiest contrasts overlapped across males and females (/I/~/e/, /T/~/s/ and /{/~/e/), and also /n/~/N/ was identified more accurately in a wordmedial than word-final position. The identification of these L2 vowels proved unproblematic, because they are typically mapped to distinct categories in the listeners' L1. While the very bottom of the list included the same pairs (/V/~/Q/, wordfinal fortis lenis distinction, /n/~/N/ word-finally, /{/~/V/ and /u:/~/U/), they were ordered differently in each gender's column. Yet, in both groups the identification was worse for the contrasts which are perceived as too similar for separate categories to be formed and those with low functional load. This all suggests that perception was influenced by the interplay between L1 and L2 rather than the listener's sex. Despite various accounts of anatomical and functional differences in the brain, our data suggest that neither of the two sexes can be considered superior in phonetic perception. In other words, the existing asymmetries do not seem to have an impact on classroom performance during ear-training sessions.