A HMM-Based Speech Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method
Abstract This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. The synthesis results are finally discussed and subjectively evaluated.
Subjective Test
The proposed subjective test can be find here
Transformation examples
F0 scale | VTF scale | Rd scale | Audio | |
---|---|---|---|---|
1 | 1 | 1 | Original voice | |
0.6 | 1 | 1 | ||
0.6 | 0.85 | 1 | ||
0.6 | 0.85 | 0.5 | Baryton voice | |
2.5 | 1 | 1 | ||
2.5 | 1.7 | 1 | ||
2.5 | 1.7 | 3 | Little girl voice |