A HMM-Based Speech Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method

Abstract This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. The synthesis results are finally discussed and subjectively evaluated.

Subjective Test

The proposed subjective test can be find here

Transformation examples

F0 scale	VTF scale	Rd scale
1	1	1	Original voice
0.6	1	1
0.6	0.85	1
0.6	0.85	0.5	Baryton voice
2.5	1	1
2.5	1.7	1
2.5	1.7	3	Little girl voice