For modeling the time series of the spoken word manna we used a
similar network compared to the saxophone model. Due to the increased
instationarity in the signal we needed an increased number of RBF
units in the network. The best results up to now has been obtained
with a network of 400 hidden units, delay time , output
dimension 8 and input dimension 11.
In figure 4 we show the original and the resynthesized signal. The quality of the model is not that high as in the case of the saxophone. Nevertheless, the word is quite understandable. From the figure we see, that the main problems stem from the transitions between consecutive phonemes. Therefore we assume that a more sophisticated control sequence may solve the problem.
Figure 4:
Original and synthesized signal of the word manna.
A
synthesis with modified control sequence is also possible.