- 4.5.1 Regularization
- 4.5.2 Stochastic Smoothing (The Cloud Method)
- 4.5.3 Logarithmic Frequency Scaling
- 4.5.4 Adding Points to Control the Envelope

4.5 Improvements of the Discrete Cepstrum Method

4.5.1 Regularization

The technique of regularization , developed in [GR90,COM97] improves the smoothness of the spectral envelope. Its idea
is to penalize too steep a slope of the spectral envelope by adding a
regularization term
to the matrix *A*, defined in
equation (3.19), where *B* is a quadratic matrix of size *p*+1, the
diagonal of which is defined by:

Then the discrete cepstrum algorithm proceeds as in section 3.4.

tex2html_comment_mark>

The effect of regularization can be seen in figure 3.8. The disadvantage or regularization is that sometimes a steep slope is necessary to reach a single extremely situated peak, as with the low peak at about 3400 Hz in the figure. With regularization, the curve falls short of reaching it.

4.5.2 Stochastic Smoothing (The Cloud Method)

The cloud method developed by Thierry Galas and Xavier Rodet in [GR90] is a way to get a smoother spectral envelope with the discrete cepstrum algorithm. The method generates a cloud of points around each partial on the frequency-amplitude plane to give the discrete cepstrum algorithm more freedom trying to fit a curve that links all the partials.

The added points *x*_{1..4} are displaced from the original point
*x*_{0} at frequency *f*_{0} and amplitude *a*_{0} by a frequency shift *f*and an amplitude factor *a* as shown in figure 3.9 left:

(4.1) | |||

(4.2) | |||

(4.3) | |||

(4.4) |

Furthermore, the shape of cloud can be used to influence the behaviour
of the spectral envelope, if additional information is known, as shown in
figure 3.9 right. For example, with a configuration as in the
figure, if it was known that a point is situated in the rising slope
of a formant, the spectral envelope could be influenced to also prefer a rising
slope. The displacement of the added points is given by

(4.5) | |||

(4.6) | |||

(4.7) | |||

(4.8) |

However, to avoid too strong a deviation of the spectral envelope from the original
point, weighting is introduced in the discrete cepstrum
algorithm to attenuate the influence of the added points with respect
to the original point. The original point is weighted with a factor
of 5, whereas the added points are weighted with a factor of 1, as
expressed by the thickness of the points in figure 3.9. The
weighting *h*_{i} is introduced in the calculation of the error
criterion in
equation (3.17):

Thus, equation (3.19) will become

and equation (3.21) is changed to

From a more formal point of view, the cloud method is in fact a replacement of each original partial (spectral peak) by a probability distribution . This is due to the impossibility of knowing the precise position of the spectral peaks, which is reflected by the probability distribution, while before a perfect knowledge of the spectral peaks was assumed.

The new error criterion, assuming
*s*_{i} = *h*_{i} = 1, is:

The distribution
can be sampled, i.e. each spectral peak
is replaced by a set of peaks
,
to
yield the cloud of points described at the beginning of this section.
Formally, for a gaussian distribution

the weights would then be .

tex2html_comment_mark>

Figure 3.10 shows the improvement of discrete cepstrum spectral envelope estimation with stochastic smoothing. The cloud method can also be combined with regularization, described in section 3.5.1 to further improve results.

4.5.3 Logarithmic Frequency Scaling

As we have seen in section 3.4, the discrete cepstrum algorithm is
of cubic complexity in *p*, the order of the discrete cepstrum. This means that we must try to
reduce the order necessary for a good estimation of the spectral envelope, to keep
computation times short. One way to achieve this is to judiciously
spend the preciseness or resolution where it is most needed, and
reduce it where it is not so important. We can exploit the properties
of the human auditory system, for that
matter.

Due to the logarithmic frequency resolution of the human hearing,
which also led to the mel frequency scale , we don't
need to be very exact with the spectral envelope in higher frequency ranges. It
suffices to represent the rough location of energy, whereas in the low
frequencies, very slight deviations in frequency and amplitude are
perceptible. Therefore we can introduce a logarithmic frequency
scaling similar to the mel scale, as suggested in [GR91b], which
is linear below a given **break frequency** , and logarithmic above.
The mel scale is defined by

where

Taking this formula directly poses problems, because frequencies can
surpass the Nyquist frequency *f*_{s} / 2, which is to be avoided, because it disturbs the validity of the subsequent calculations.
Normalizing the range of the
mel function to within the
Nyquist frequency yields:

where

The new logarithmic scaling function

The effect of logarithmic frequency scaling can be seen in figure 3.11.

tex2html_comment_mark>

As an additional advantage, spectral envelopes stored with a logarithmic frequency support take less space. Also the complexity of synthesis can be reduced when there are fewer points needed to represent a spectral envelope.

To retrieve the spectral envelope from cepstral coefficients obtained with
logarithmic frequency scaling, the frequencies *f*_{i} of the bins of
the envelope (see end of section 3.3) have to be converted to
logarithmic scale:

Then proceed with equations (3.9) and (3.10).

There is one pitfall in the application of logarithmic frequency scaling: Performing the linear-to-logarithmic transformation before applying the cloud deteriorates the results slightly. To see why this is so, remember that the cloud algorithm (section 3.5.2) adds points with a constant linear shift around each peak frequency, which will subsequently be stretched for the linear part or unsymmetrically converted to the logarithmic scale for the rest.

4.5.4 Adding Points to Control the Envelope

There are two situations where the behaviour of the discrete cepstrum spectral envelope has to be controlled by adding artificial points to the partials for discrete cepstrum estimation: at the borders and between the highest partial and the upper border. If the cloud method of stochastic smoothing (section 3.5.2) is selected, these points would be added before the cloud is applied (i.e. each added point will have a cloud of points around it).

tex2html_comment_mark>

The **border points** are added at the frequencies *m* and at
(*f*_{s}/2
- *m*) with half the amplitude of the lowest/highest partial,
respectively. This will force the spectral envelope to have a downward slope at
the borders. Thus, if--by downward transposition of the partials
while keeping the spectral envelope--there is a partial which is moved to lower
frequencies, there is no risk that it will suddenly rise in amplitude,
but it will be faded out smoothly. Figure 3.12 shows the
effect of adding border points.

However, if the frequency of the lowest partial is less than *m*, the
low border point is not added, since this would obviously cause an
unjustified dip in the spectral envelope, which would disturb the smoothness. We
don't have to worry about the border condition then anyway, since then
the lowest partial would be very close to the 0 Hz border,
constraining the spectral envelope enough to prevent it from rising. The same
holds analogously for the high border point.

The **filling points** fill up a possible gap between the frequency
of the highest partial and *f*_{s}/2 with very low amplitude peaks,
spaced at *f*_{s} / 2*n* (*n* being the number of points of the envelope
requested). This is to avoid too much freedom for the spectral envelope. If the
spectral envelope was unconstrained in a large frequency range, it would oscillate
wildly.