3.3 Spectral Envelopes

In this section, I will give a more detailed description of what spectral envelopes are. A spectral envelope is a curve in the frequency-amplitude plane, derived from a Fourier magnitude spectrum. It describes one point in time (one window, to be precise). The following properties are desirable for spectral envelopes:

**Figure 2.20:** Spectrum and spectral envelope of the clarinet sound of figure 2.14
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/specenv1.eps}} \end{figure}$

**Figure 2.21:** Spectrum and spectral envelope of a piano
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/specenv2.eps}} \end{figure}$

**Figure 2.22:** Spectrum and spectral envelope of a violin
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/specenv3.eps}} \end{figure}$

From the examples of spectral envelopes in figures 2.20 to 2.22 we see that the characteristics of musical instruments lead to distinctive spectral envelopes. The spectral envelope also reflects the class of a vowel (the phoneme ) in speech, as can be seen in figures 2.23 to 2.25.

**Figure 2.23:** Spectrum and spectral envelope of a the vowel /e/
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/vowel1.eps}} \end{figure}$

**Figure 2.24:** Spectrum and spectral envelope of a the vowel /a/
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/vowel2.eps}} \end{figure}$

**Figure 2.25:** Spectrum and spectral envelope of a the vowel /o/
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/vowel3.eps}} \end{figure}$

3.3.1 Spectral Envelope Correction for Transposition

In speech or in the singing voice, the spectral envelope is quite independent of the pitch (see section 2.4 for why this is so). However, if we transpose the vowel in figure 2.23 up by one octave by multiplying the frequencies of all partials by 2 and performing an additive resynthesis, the spectral envelope will necessarily be transposed also. Figure 2.26 shows this effect which sounds quite unnatural (it is sometimes termed the mickey mouse effect ). The unnaturalness comes from the fact that the formants are shifted up one octave, which corresponds to shrinking the vocal tract to half of its length. Obviously, this is not the natural behaviour of the vocal tract.

To avoid this, the spectral envelope has to be kept constant, while the partials ``slide'' along it to their new values. This means that the amplitude of a transposed partial is no longer determined by the amplitude of the original partial, but by the value of the spectral envelope at the frequency of the transposed partial, as in figure 2.27. This way, only the partials are shifted, but the spectral envelope and thus the formant locations stay the same, making the vowel sound natural.

For an easier comparison, figure 2.28 shows the spectral envelopes of the transposed sound with and without spectral envelope correction, on a frequency grid spaced at 366 Hz intervals, the fundamental frequency of the transposed sound. It can be clearly seen that the partials of both are at the same frequencies, but at different amplitudes, and that one spectral envelope is the stretched version of the other (although the compressed spectral envelope lacks some of the details of the stretched one).

**Figure 2.26:** Transposition of voice without spectral envelope correction
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/voweltrans.eps}} \end{figure}$

**Figure 2.27:** Transposition of voice with spectral envelope correction
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/vowelkeep.eps}} \end{figure}$

**Figure 2.28:** Transposition of voice: The spectral envelopes of figures 2.26 and 2.27 are layered to show the effect of transposition with and without spectral envelope correction
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/voweltranskeep.eps}} \end{figure}$

3.3.2 Vibrato Tracing of Spectral Envelopes

Xavier Rodet remarked that an interesting way to observe the true spectral envelope of the singing voice is to exploit its independence of pitch, making use of the small but fast variation of pitch while singing with a vibrato. In one test recording, during one period of vibrato of c.a. 200 ms, the fundamental frequency f₀ oscillates around 149 Hz by a maximum deviation in frequency of b₀ = 3.4 Hz. The harmonic partials at k times f₀ frequency follow, and oscillate by $k \* b_0$ . Because the spectral envelope stays fixed, they sweep underneath its curve, tracing small portions of its contour, but nevertheless giving its exact slope.

$\begin{figure}\centerline{\epsfbox[114 282 540 515]{pics/vibratoformants.eps}} <\end{figure}$

$\begin{figure}\centerline{\epsfbox[114 282 540 515]{pics/vibratoformantzoom.eps}} <\end{figure}$

Figure 2.29 shows the partials of all the 20 time-frames during one period of a vibrato, collapsed together into one frame. (The data was generated by Nathalie Henrich [Hen98].) In figure 2.30, a close-up of figure 2.29, the traces left by each partial while they follow the oscillation of the fundamental frequency show up as distinct groups of crosses. All in all, at higher frequencies, the amount of irritating factors augments, and more noise is apparent.