The technique of regularization , developed in [GR90,COM97] improves the smoothness of the spectral envelope. Its idea
is to penalize too steep a slope of the spectral envelope by adding a
to the matrix A, defined in
equation (3.19), where B is a quadratic matrix of size p+1, the
diagonal of which is defined by:
Then the discrete cepstrum algorithm proceeds as in section 3.4.
The effect of regularization can be seen in figure 3.8. The disadvantage or regularization is that sometimes a steep slope is necessary to reach a single extremely situated peak, as with the low peak at about 3400 Hz in the figure. With regularization, the curve falls short of reaching it.
The cloud method developed by Thierry Galas and Xavier Rodet in [GR90] is a way to get a smoother spectral envelope with the discrete cepstrum algorithm. The method generates a cloud of points around each partial on the frequency-amplitude plane to give the discrete cepstrum algorithm more freedom trying to fit a curve that links all the partials.
The added points x1..4 are displaced from the original point
x0 at frequency f0 and amplitude a0 by a frequency shift fand an amplitude factor a as shown in figure 3.9 left:
Furthermore, the shape of cloud can be used to influence the behaviour
of the spectral envelope, if additional information is known, as shown in
figure 3.9 right. For example, with a configuration as in the
figure, if it was known that a point is situated in the rising slope
of a formant, the spectral envelope could be influenced to also prefer a rising
slope. The displacement of the added points is given by
However, to avoid too strong a deviation of the spectral envelope from the original
point, weighting is introduced in the discrete cepstrum
algorithm to attenuate the influence of the added points with respect
to the original point. The original point is weighted with a factor
of 5, whereas the added points are weighted with a factor of 1, as
expressed by the thickness of the points in figure 3.9. The
weighting hi is introduced in the calculation of the error
From a more formal point of view, the cloud method is in fact a replacement of each original partial (spectral peak) by a probability distribution . This is due to the impossibility of knowing the precise position of the spectral peaks, which is reflected by the probability distribution, while before a perfect knowledge of the spectral peaks was assumed.
The new error criterion, assuming
si = hi = 1, is:
can be sampled, i.e. each spectral peak
is replaced by a set of peaks
yield the cloud of points described at the beginning of this section.
Formally, for a gaussian distribution
Figure 3.10 shows the improvement of discrete cepstrum spectral envelope estimation with stochastic smoothing. The cloud method can also be combined with regularization, described in section 3.5.1 to further improve results.
As we have seen in section 3.4, the discrete cepstrum algorithm is of cubic complexity in p, the order of the discrete cepstrum. This means that we must try to reduce the order necessary for a good estimation of the spectral envelope, to keep computation times short. One way to achieve this is to judiciously spend the preciseness or resolution where it is most needed, and reduce it where it is not so important. We can exploit the properties of the human auditory system, for that matter.
Due to the logarithmic frequency resolution of the human hearing,
which also led to the mel frequency scale , we don't
need to be very exact with the spectral envelope in higher frequency ranges. It
suffices to represent the rough location of energy, whereas in the low
frequencies, very slight deviations in frequency and amplitude are
perceptible. Therefore we can introduce a logarithmic frequency
scaling similar to the mel scale, as suggested in [GR91b], which
is linear below a given break frequency , and logarithmic above.
The mel scale is defined by
Taking this formula directly poses problems, because frequencies can
surpass the Nyquist frequency fs / 2, which is to be avoided, because it disturbs the validity of the subsequent calculations.
Normalizing the range of the
mel function to within the
Nyquist frequency yields:
The effect of logarithmic frequency scaling can be seen in figure 3.11.
As an additional advantage, spectral envelopes stored with a logarithmic frequency support take less space. Also the complexity of synthesis can be reduced when there are fewer points needed to represent a spectral envelope.
To retrieve the spectral envelope from cepstral coefficients obtained with
logarithmic frequency scaling, the frequencies fi of the bins of
the envelope (see end of section 3.3) have to be converted to
There is one pitfall in the application of logarithmic frequency scaling: Performing the linear-to-logarithmic transformation before applying the cloud deteriorates the results slightly. To see why this is so, remember that the cloud algorithm (section 3.5.2) adds points with a constant linear shift around each peak frequency, which will subsequently be stretched for the linear part or unsymmetrically converted to the logarithmic scale for the rest.
There are two situations where the behaviour of the discrete cepstrum spectral envelope has to be controlled by adding artificial points to the partials for discrete cepstrum estimation: at the borders and between the highest partial and the upper border. If the cloud method of stochastic smoothing (section 3.5.2) is selected, these points would be added before the cloud is applied (i.e. each added point will have a cloud of points around it).
The border points are added at the frequencies m and at (fs/2 - m) with half the amplitude of the lowest/highest partial, respectively. This will force the spectral envelope to have a downward slope at the borders. Thus, if--by downward transposition of the partials while keeping the spectral envelope--there is a partial which is moved to lower frequencies, there is no risk that it will suddenly rise in amplitude, but it will be faded out smoothly. Figure 3.12 shows the effect of adding border points.
However, if the frequency of the lowest partial is less than m, the low border point is not added, since this would obviously cause an unjustified dip in the spectral envelope, which would disturb the smoothness. We don't have to worry about the border condition then anyway, since then the lowest partial would be very close to the 0 Hz border, constraining the spectral envelope enough to prevent it from rising. The same holds analogously for the high border point.
The filling points fill up a possible gap between the frequency of the highest partial and fs/2 with very low amplitude peaks, spaced at fs / 2n (n being the number of points of the envelope requested). This is to avoid too much freedom for the spectral envelope. If the spectral envelope was unconstrained in a large frequency range, it would oscillate wildly.