next up previous contents index
Next: 5.6 High Resolution Matching Up: 5. Representation of Spectral Previous: 5.4 Geometric Representation

   
5.5 Formants

In a formant representation , a spectral envelope is composed of a parametric description of formants (the resonances of the vocal tract or of other acoustic resonator--see section 2.4) and a residual envelope. Three ways to represent formants will be presented: FOFs, standard formants, and fuzzy formants (cf. figure 4.2).


  
Figure 4.2: A FOF (left), a precise formant (middle), and a fuzzy formant, which is simply a frequency region in a spectral envelope (right), with their frequency-domain parameters
\begin{figure}\centerline{\epsfbox{pics/formants.eps}} \end{figure}

Formant-wave functions 

 A FOF , from the french Forme d'onde formantique  [Rod84], is originally a method of high quality voice synthesis and sound synthesis in general. It forms the basic synthesis model of the CHANT system (section 2.5). A FOF is a time-domain representation of a single formant as a basic waveform, several of which are added to build up the desired spectrum (typically 5-7). A FOF is parameterized both in terms of the frequency-domain and of the time-domain. The parameters, in fact, specify the spectral envelope of one formant.

The frequency-domain parameters of a FOF are center frequency f, amplitude a, bandwidth b, and skirt width s, which can be controlled independently from the bandwidth; the time-domain parameters are phase $\phi$ and excitation and attenuation times. It can be seen, that this is much more information than is needed for a description of a spectral envelope.

Precise formants 

A more economical way to describe formants is to use the standard parameters of a resonance: center frequency f, amplitude a, and bandwidth b. The bandwidth specifies half of the width of the formant at 3 dB down from the peak. Then the spectral envelope $v(\omega)$ of one formant is of the form (with an appropriate scaling of the parameters):

   \begin{displaymath}
v(\omega) = e^{-\left( \frac{\omega - c}{b} \right) ^ 2}
\end{displaymath}

Fuzzy formants 

 As an augmentation of the spectral representation (section 4.3), I define fuzzy formants as a formant region  within a spectral envelope where it is believed or known that a formant lies. With labeled source material (a recording of the voice with annotations what speech sounds (phonemes) are spoken or sung), the positions of the formants in vowels are fairly well known.

A fuzzy formant is specified by three frequency parameters, the lower bound l, the upper bound u, and the center c, if known. Additionally, a bookkeeping parameter gives an identification to each formant, such that they can be associated into formant tracks .

With a formant representation, the general problems of finding and identifying formants exist. For unlabeled data, the identification which hump in the spectral envelope is really a formant, and if it's the first, second, etc, is far from being trivial.

The formant representation is not stable, since a slight ditch in the spectral envelope could suddenly create a new formant. They are local, however, and flexibly and very easily manipulable. Synthesis is reasonably fast, both for the frequency-domain and for the time-domain. They are very compact in storage, if a pure formantic representation is sufficient (or the loss in preciseness is bearable), but for most cases they would need a residual spectral representation to be stored along with them.

For specifying spectral envelopes manually, especially for the precise synthesis of the voice, formant representations are best suited.


next up previous contents index
Next: 5.6 High Resolution Matching Up: 5. Representation of Spectral Previous: 5.4 Geometric Representation
Diemo Schwarz
1998-09-07