Modeling a speech signal

Next: Further developments Up: Practical results Previous: Modeling a saxophone

Modeling a speech signal

For modeling the time series of the spoken word manna we used a similar network compared to the saxophone model. Due to the increased instationarity in the signal we needed an increased number of RBF units in the network. The best results up to now has been obtained with a network of 400 hidden units, delay time , output dimension 8 and input dimension 11.

In figure 4 we show the original and the resynthesized signal. The quality of the model is not that high as in the case of the saxophone. Nevertheless, the word is quite understandable. From the figure we see, that the main problems stem from the transitions between consecutive phonemes. Therefore we assume that a more sophisticated control sequence may solve the problem.

Figure 4: Original and synthesized signal of the word manna.

A synthesis with modified control sequence is also possible.

Axel Roebel
Thu Nov 9 12:55:11 MET 1995