In a formant representation , a spectral envelope is composed of a parametric description of formants (the resonances of the vocal tract or of other acoustic resonator--see section 2.4) and a residual envelope. Three ways to represent formants will be presented: FOFs, standard formants, and fuzzy formants (cf. figure 4.2).
The frequency-domain parameters of a FOF are center frequency f, amplitude a, bandwidth b, and skirt width s, which can be controlled independently from the bandwidth; the time-domain parameters are phase and excitation and attenuation times. It can be seen, that this is much more information than is needed for a description of a spectral envelope.
A fuzzy formant is specified by three frequency parameters, the lower bound l, the upper bound u, and the center c, if known. Additionally, a bookkeeping parameter gives an identification to each formant, such that they can be associated into formant tracks .
With a formant representation, the general problems of finding and identifying formants exist. For unlabeled data, the identification which hump in the spectral envelope is really a formant, and if it's the first, second, etc, is far from being trivial.
The formant representation is not stable, since a slight ditch in the spectral envelope could suddenly create a new formant. They are local, however, and flexibly and very easily manipulable. Synthesis is reasonably fast, both for the frequency-domain and for the time-domain. They are very compact in storage, if a pure formantic representation is sufficient (or the loss in preciseness is bearable), but for most cases they would need a residual spectral representation to be stored along with them.
For specifying spectral envelopes manually, especially for the precise synthesis of the voice, formant representations are best suited.