Temporal Parameters of Spontaneous Speech in Forensic Speaker Identification in Case of Language Mismatch: Serbian as L1 and English as L2
PDF (Język Polski)


orensic speaker identification
cross-lingual comparison
articulation rate
speech rate

How to Cite

TOMIĆ, K. (2017). Temporal Parameters of Spontaneous Speech in Forensic Speaker Identification in Case of Language Mismatch: Serbian as L1 and English as L2. Comparative Legilinguistics, 32, 117–144. https://doi.org/10.14746/cl.2017.32.5


The purpose of the research is to examine the possibility of forensic speaker identification if question and suspect sample are in different languages using temporal parameters (articulation rate, speaking rate, degree of hesitancy, percentage of pauses, average pause duration). The corpus includes 10 female native speakers of Serbian who are proficient in English. The parameters are tested using Bayesian likelihood ratio formula in 40 same-speaker and 360 different-speaker pairs, including estimation of error rates, equal error rates and Overall Likelihood Ratio. One-way ANOVA is performed to determine whether inter-speaker variability is higher than intra- speaker variability across languages. The most successful discriminant is degree of hesitancy with ER of 42.5%/28%, (EER: 33%), followed by average pause duration with ER 35%/45.56%, (EER: 40%). Although the research features a closed-set comparison, which is not very common in forensic reality, the results are still relevant for forensic phoneticians working on criminal cases or as expert witnesses. This study pioneers in forensically comparing Serbian and English as well as in forensically testing temporal parameters on bilingual speakers. Further research should focus on comparing two stress-timed or two syllable-timed languages to test whether they will be more comparable in terms of temporal aspects of speech. 

PDF (Język Polski)


Aitken, Colin G. C. 1999. Statistics and the Evaluation of Evidence for Forensic Scientists. Chichester: Wiley, 1995.

Bhattacharjee, Utpal, and Kshirod Sarmah. 2012. GMM-UBM Based Speaker Verification in Multilingual Environment. International Journal of Computer Science Issues 9, no. 6 (2012): 373-380.

Braun, Angelika. 1995. Fundamental frequency – how speaker- specific is it? In Studies in Forensic Phonetics, edited by Angelika Braun and Jens Peter Köster, 9-23. Trier: Wissenschaftlicher Verlag, 1995.

Cao, Honglin, and Yingli Wang. 2011. A forensic aspect of articulation rate variation in Chinese. The Proceedings of the International Conference on Phonetic Sciences (ICPhS XVII). Hong Kong. 396-399.

Durou, Geoffrey. 1999. Multilingual Text-independent Speaker Identification. Proceedings of Multi-lingual Interoperability in Speech Technology (MIST). Leusden, The Netherlands: PN. 115-118.

Elliott, Jennifer. 2001. Auditory and F-pattern variation in Australian Okay: a forensic investigation. Acoustics Australia 29, no. 1 (2001): 37-41.

Faundez-Zanuy, Marcos, and Antonio Satué-Villar. 2006. Speaker Recognition Experiments on a Bilingual Database. Proceedings of IV Jornadas en Tecnologias del Habla (4JTH). Zaragoza, Spain. 261-264.

Fónagy, Ivan, and K Magdics. 1960. Speed of Utterance in Phrases of Different Lengths. Language and Speech 3, no. 4 (1960): 179- 192.

Gfroerer, Stefan, and Isolde Wagner. 1995. Fundamental frequency in forensic speech samples. In Studies in Forensic Phonetics, edited by Angelika Braun and Jens Peter Köster. 41-48/ Trier: Wissenschaftlicher Verlag.

Gold, Erica. 2012. Articulation rate as a discriminant in forensic speaker comparisons. UNSW Forensic Speech Science Conference 2012. Sidney, Australia.

Goldman-Eisler, Frieda. 1968. Psycholinguistics. Experiments in Spontaneous Speech. London/New York: Academic Press.

Grosjean, François, and A Deschamps. 1975. Analyse contrastive des variables temporelles de l'anglais et du français; vitesse de paroles et variables composantes, phénomènes d'hésitation. Phonetica 31 (1975): 144-184.

Gut, Ulrike, Jürgen Trouvain, and William J Barry. 2007. Bridging research on phonetic descriptions with knowledge from teaching practice – The case of prosody in non-native speech. In Non-Native Prosody. Phonetic Description and Teaching Practice, edited by Urlike Gut, Jürgen Trouvain and William J Barry. 1-21. Berlin: De Gruyter Mouton.

Gut, Urlike. 2003. Prosody in second language speech production: the role of the. Zeitschrift für Fremdsprachen Lehren und Lernen 32 (2003): 133-152.

Hollien, Harry. 1990. The Acoustics of Crime: The New Science of Forensic Phonetics. New York: Springer.

Hu, Ling. 2007. Long pauses in Chinese EFL learners' speech production. Interlinguistica 17 (2007): 606-616.

Jessen, Michael. 2010. The forensic phonetician: Forensic speaker identification by experts. In The Routledge Handbook of Forensic Linguistics, edited by Malcolm Coulthard and Alison Johnson, 702. Abingdon and New York: Routledge.

Kinoshita, Yuko. 2001. Testing realistic forensic speaker identification in Japanese: a likelihood ratio based approach using formants. Unpublished PhD Thesis. The Australian National University.

Köster, Olaf, Michael Jessen, Freshta Khairi, and Hartwig Eckert. 2007. Auditory-perceptual identification of voice quality by expert and non-expert listeners. Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken. 1845–1848.

Kumar, Rajeev, Rajesh Ranjan, Sanjay Kumar Singh, Rahul Kala, Anupam Shukla, and Ritu Tiwari. 2009. Multilingual speaker recognition using neural network. Proceedings of the Frontiers of Res. on Speech and Music, FRSM 2009. Gwalior, India. 1-8.

Künzel, Hermann J. 2010. Automatic Speaker Identification with Multilingual Speech Material. Abstracts, IAFPA 2010, The 19th Annual Conference of the International Association for Forensic Phonetics and Acoustics. Trier, Germany: Depatment of Phonetics, Trier University. 20.

Künzel, Hermann J. 2013. Automatic speaker recognition with crosslanguage speech material. International Journal of Speech Language and the Law 20, no. 1 (2013): 21-44.

Künzel, Hermann J. 1997. Some general phonetic and forensic aspects of speaking tempo. International Journal of Speech Language and the Law 4, no. 1 (1997).

Laver, John. 1994. Principles of Phonetics. Cambridge: Cambridge University Press.

Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, Massachusetts and London: The MIT Press.

Lennon, Paul. 1990. Investigating fluency in EFL: A quantitative approach. Language Learning 40 (1990): 378-417.

Luengo, Iker, et al. 2008. Text Independent Speaker Identification in Multilingual Environments. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC '08). Marrakech, Marocco: European Language Association (ERLA).

Miller, Joanne L, François Grosjean, and Concetta Lomanto. 1984. Articulation rate and its variability in spontaneous speech: a reanalysis and some implications. Phonetica 41 (1984): 215- 225.

Nakasone, Hirotaka, and Steven D. Beck. 2001. Forensic Automatic Speaker Recognition. Proceedings of 2001 Speaker Odyssey Speaker Recognition Workshop. Crete, Greece. 1-6.

Osser, Harry, and Frederick Peng. 1964. A Cross Cultural Study of Speech Rate. Language and Speech 7, no. 2 (1964): 120-125.

Paunović, Tatjana. 2011. Sounds Serbian? Acoustic properties of Serbian EFL students' speech. Edited by Eliza Kitis, Nikolaos Lavidas, Nina Topintzi and Tasos Tsangalidis. Selected Papers from the 19th International Symposium on Theoretical and Applied Linguistics (ISTAL19). Thesaloniki: Aristotle University of Thessaloniki, School of English, Department of Theoretical & Applied Linguistics. 357-369.

Poulisse, Nanda. 1999. Slips of the Tongue: Speech Errors in First and Second Language Production. Amsterdam: John Benjamins Publishing Co.

Rieger, Caroline L. 2003. Disfluencies and hesitation strategies in oral L2 tests. Proceedings of DiSS’03: Disfluency in Spontaneous Speech Workshop, 5–8 September 2003, 142 Göteborg University. Edited by Robert Eklund. Theoretical Linguistics. 41-44. Sweden: Gothoenburg Papers.

Roach, Peter. 1998. Some Languages are Spoken More Quickly Than Others. In Language Myths, edited by Laurie Bauer and Peter Trudgill. 150-158. London, New York, Auckland, Toronto, Ringwood: Penguin Books.

Rose, Philip. 2002. Forensic Speaker Identification. London and New York: Taylor & Francis.

Trouvain, Jürgen. 2003. Tempo Variation in Speech Production: Implication for Speech Synthesis. PhD Thesis. der Philosophischen Fakultäten der Universität des Saarlandes, Saarbrücken.

Trouvain, Jürgen, and Bernd Möbius. 2014. Sources of variation of articulation rate in native and non-native speech: comparisons of French and German. Proceedings of Speech and Prosody (SP7). 275-279. Dublin.

Trouvain, Jürgen, and Khiet P Truong. 2012. Comparing Non-Verbal Vocalisations in Conversational Speech Corpora. 4th International Workshop on Corpora for Research on Emotion Sentiment and Social Signals (ES3 2012). 36-39. Istanbul, Turkey.

Trouvain, Jürgen, Jacques Koreman, Attilio Erriquez, and Bettina Braun. 2001. Articulation Rate Measures and Their Relation to Phone Classification in Spontaneous and Read German Speech. Proceedings of the Workshop Adaptation Methods for Speech Recognition: Sophia-Antipolis, France, August 29 - 30, 2001. 155-158.

Walker, Jean F, Lisa M. D Archibald, Sharon R Cherniak, and Valerie G Fish. 1994. Articulation Rate in 3- and 5-Year-Old Children. Journal of Speech, Language, and Hearing Research 35 (1994): 4-13.

Wiese, Richard. 1984. Language Production in Foreign and Native Languages: Same or Different? In Second Language Productions, edited by H. W. Dechert, D Möhle and M Raupach, 11-25. Tübingen: Narr.

Wu, Chen-huei. 2008. Filled Pauses in L2 Chinese: A Comparison of Native and Non-Native Speakers. Edited by K M Marjorie, Chan Kang and Hana Kang. Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20). 213-227. Ohio: The Ohio State University.