In this section, I will give a more detailed description of what spectral envelopes are. A spectral envelope is a curve in the frequency-amplitude plane, derived from a Fourier magnitude spectrum. It describes one point in time (one window, to be precise). The following properties are desirable for spectral envelopes:
From the examples of spectral envelopes in figures 2.20 to 2.22 we see that the characteristics of musical instruments lead to distinctive spectral envelopes. The spectral envelope also reflects the class of a vowel (the phoneme ) in speech, as can be seen in figures 2.23 to 2.25.
In speech or in the singing voice, the spectral envelope is quite independent of the pitch (see section 2.4 for why this is so). However, if we transpose the vowel in figure 2.23 up by one octave by multiplying the frequencies of all partials by 2 and performing an additive resynthesis, the spectral envelope will necessarily be transposed also. Figure 2.26 shows this effect which sounds quite unnatural (it is sometimes termed the mickey mouse effect ). The unnaturalness comes from the fact that the formants are shifted up one octave, which corresponds to shrinking the vocal tract to half of its length. Obviously, this is not the natural behaviour of the vocal tract.
To avoid this, the spectral envelope has to be kept constant, while the partials ``slide'' along it to their new values. This means that the amplitude of a transposed partial is no longer determined by the amplitude of the original partial, but by the value of the spectral envelope at the frequency of the transposed partial, as in figure 2.27. This way, only the partials are shifted, but the spectral envelope and thus the formant locations stay the same, making the vowel sound natural.
For an easier comparison, figure 2.28 shows the spectral envelopes of the transposed sound with and without spectral envelope correction, on a frequency grid spaced at 366 Hz intervals, the fundamental frequency of the transposed sound. It can be clearly seen that the partials of both are at the same frequencies, but at different amplitudes, and that one spectral envelope is the stretched version of the other (although the compressed spectral envelope lacks some of the details of the stretched one).
Xavier Rodet remarked that an interesting way to observe the true spectral envelope of the singing voice is to exploit its independence of pitch, making use of the small but fast variation of pitch while singing with a vibrato. In one test recording, during one period of vibrato of c.a. 200 ms, the fundamental frequency f0 oscillates around 149 Hz by a maximum deviation in frequency of b0 = 3.4 Hz. The harmonic partials at k times f0 frequency follow, and oscillate by . Because the spectral envelope stays fixed, they sweep underneath its curve, tracing small portions of its contour, but nevertheless giving its exact slope.
Figure 2.29 shows the partials of all the 20 time-frames during one period of a vibrato, collapsed together into one frame. (The data was generated by Nathalie Henrich [Hen98].) In figure 2.30, a close-up of figure 2.29, the traces left by each partial while they follow the oscillation of the fundamental frequency show up as distinct groups of crosses. All in all, at higher frequencies, the amount of irritating factors augments, and more noise is apparent.