A HMM-Based Speech Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method

Abstract This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. The synthesis results are finally discussed and subjectively evaluated.

Subjective Test

The proposed subjective test can be find here

Transformation examples

F0 scaleVTF scaleRd scaleAudio
111 Original voice
0.611
0.60.851
0.60.850.5 Baryton voice
2.511
2.51.71
2.51.73 Little girl voice