Ircam - Centre Georges-Pompidou Equipe Analyse/Synthèse

Statistical Modeling of Sound Aperiodicities

Shlomo Dubnov & Xavier Rodet

to appear in ICMC97, Thessaloniki, Grece, September 1997

Abstract

Acoustical musical instruments which are considered to produce a well defined pitch, emit waveforms which are never exactly periodic. The aperiodicities supposedly originate in some not well known fundamental mechanism of their sound production. This effect, which for time scales shorter than 100 or 200 ms is beyond the control of the player, is expected to be typical of the particular instrument or maybe of the instrument family. Several methods which investigate aperiodicities in the waveform of musical sounds have recently appeared in the literature, such as examination of variations in the waveform of the sound between consecutive periods, Fourier transforms of sonagrams that reveal the presence of subharmonic modulations or correlograms which correlate in time the outputs of auditory models. In an earlier work we have shown that a particular aspect of coherence of fluctuations is strongly related to non-linear properties of the time series model of the signal. These properties are measured by Higher Order Statistics (HOS) or polyspectra and were shown to be important for characterisation of musical instruments in the sustained portion of the sound. It should be noted that the particular statistical property of coherence/incoherence can not be easily revealed by the other analysis methods. The purpose of this work is to further extend this research, both theoretically and practically, combining our HOS results with the other aforementioned methods. Specifically, our goal is to define a statistical model for fluctuations of the sound parameters in the sustained portion of the sound, which could be incorporated into existing analysis/synthesis methods, such as the additive method. A comparison of HOS properties of real signals versus synthetic ones re-synthesized via additive analysis/synthesis method, shows that HOS are preserved. This supports the notion that HOS are related to phase jitter of the actual harmonic partials and this suggests also that sinusoidal signal models are appropriate for modeling it. By using a mechanism of random frequency modulations (jitter), applied either independently or with correlation to the various harmonic partials, synthetic signal with various desired HOS properties can be simulated. Gradually increasing the amplitude and bandwidth of the jitter takes the sound from perfect pitch to noise in two routes: for the coherent case, it increases the perceived random pitch fluctuations of a single sound. On the contrary the incoherent route is perceived as increasing the amount of added noise, while maintaining the sense of a more or less stable pitch. The difference between the two sounds, although it can not be observed in a long term spectral analysis, is clearly revealed by the different decay of HOS as a function of increasing jitter parameters.

A detailed look at the waveform of a coherent signal shows that the effect of jitter is equivalent to a local time scaling, thus stretching or contracting the original waveform's shape. The HOS are not affected by this jitter and remain constant. On the contrary, for the case of independent jitters, the waveform varies in time, and does not preserve locally the phase relations. This causes HOS to decay, with a rate proportional to the bandwidth of the jitter.

This interesting observation suggests that differences/similarities between successive portions of a signal might be better represented by scalograms in the coherent case, contrary to spectral representation that is better suited for the non coherent case. This matching could be considered also as a search for best correlation between two consecutive segments of sound, which matches spectral amplitude only in the non coherent case, versus match both in magnitude and phase over a certain range of possible scale differences in the coherent case. The second method, which requires scaling invariance, seems to be closer to the auditory modeling approaches. We have studied several applications of the jitter model. Let us mention a few:
- modeling of realistic jitter for additive and source filter (``Chant'') methods.
- reducing the amount of short time analysis frames in the sustained portion of the sound, by separating the jitter from other spectral information.
- morphing between sounds by their jitter properties.
- investigating the behaviour of jitter for different modes of playing and ``expressivity'' control.
Results of real and synthetic sound analyses will be detailed in the paper. Examples of sound synthesis will be demonstrated in the presentation.