Close Copy Speech Synthesis for Speech Perception Testing

Jak cytować

Bachan, J., & Gibbon, D. (2006). Close Copy Speech Synthesis for Speech Perception Testing. Investigationes Linguisticae, 13, 9–24.


The present study is concerned with developing a speech synthesis subcomponent for perception testing in the context of evaluating cochlear implants in children. We provide a detailed requirements analysis, and develop a strategy for maximally high quality speech synthesis using Close Copy Speech synthesis techniques with a diphone based speech synthesiser, MBROLA. The close copy concept used in this work defines close copy as a function from a pair of speech signal recording and a phonemic annotation aligned with the recording into the pronunciation specification interface of the speech synthesiser. The design procedure has three phases: Manual Close Copy Speech (MCCS) synthesis as a ?best case gold standard?, in which the function is implemented manually as a preliminary step; Automatic Close Copy Speech (ACCS) synthesis, in which the steps taken in manual transformation are emulated by software; finally, Parametric Close Copy Speech (PCCS) synthesis, in which prosodic parameters are modifiable while retaining the diphones. This contribution reports on the MCCS and ACCS synthesis phases.


Bachan, J. 2006. Verification of a Set of Speech Perception Tests for Children with a Cochlear Implant. Speech signal annotation, processing and synthesis, [in:] Proceedings of Speech Signal Annotation, Processing and Synthesis Symposium, Poznań, September 2006.

Boersma, P. andD. Weenink. 2001. PRAAT, a system for doing phonetics by Computer. Glot International 5(9/10): 341-345.

Demenko, G. & Wypych, M. & Baranowska, E. 2003. Implementation of Grapheme-to-Phoneme Rules and Extended SAMPA Alphabet in Polish Text-to-Speech Synthesis. Speech and Language Technology Vol. 7. Poznań: Zakład Graficzny UAM.

Demenko, G. & Grocholewski, S. & Wagner, A. & Szymański M. 2006. Prosody annotation for corpus based speech synthesis. [in:] Proceedings of the Eleventh Australasian International Conference on Speech Science and

Technology. Auckland, New Zealand.

Dutoit, T. 1997. An Introduction To Text-To-Speech Synthesis. Dordrecht: Kluwer Academic Publishers.

Dutoit, T. 2005. The MBROLA project. <>, accessed 2006-11-30.

Gibbon, D. & Moore, R. & Winski, R. 1997. Handbook of Standards and Resources for Spoken Language Systems. Berlin: Mouton de Gruyter.

Gibbon, D. & Mertins, I. & Moore, R. 2000. Handbook o f Multimodal and Spoken Dialogue Systems: Terminology, Resources and Product Evaluation. New York: Kluwer Academic Publishers.

Gut, U. & Milde, J-T. 2003. Annotation and Analysis of Conversational Gestures in the TASX environment. Kunstliche Intelligenz 17:4.

Szklanny, K. & Masarek, K. 2002. PL1 - A Polish female voice for the MBROLA synthesizer. Copying the MBROLA Bin and Databases. <>, accessed 2006-11-25.

Sjolader, Kare & Jonas Beskow. 2005. WaveSurfer 1.8.5/0511011429 © 2005.