3.2 Additive Analysis and Synthesis of Sound

On closer scrutiny of the Fourier spectrum of the sound of a musical instument or the voice, a specific structure can be observed: Besides a peak at the fundamental frequency f₀ (the frequency heard as the pitch of the sound), there are peaks at 2 times, 3 times, 4 times, and so on, of the fundamental frequency, as can be seen in figure 2.14.

**Figure 2.14:** Spectrum of a clarinet played at 440 Hz. The grid lines spaced at 440 Hz intervals show that the sound is made up only of harmonic partials.

These peaks are in fact the frequency-domain representation of sinusoids at integer multiples of the fundamental frequency, called harmonics , or harmonic partials . Taking advantage of this, it is possible to represent a sound by a list of the amplitudes and phases of the harmonic partials and the residual noise. The latter is the part of the spectrum which can not be expressed by sinusoids at integer multiples of the fundamental frequency. The residual noise is usually very low, as in figure 2.14, but it can also be much louder and contribute essentially to the characteristics of a timbre, e.g. in the breath noise in the attack of a wind instrument, as in figure 2.15, or the thump of the hammer of a piano. In general, the sharp transients in the attack phase of an instrumental sound are best modeled by noise.

**Figure 2.15:** Magnitude spectrum of a shakuhachi flute. This spectrum clearly reveals a harmonic partial structure, but the residual noise is much louder than in figure 2.14, in relation to the partials.
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/spectrum2.eps}} \end{figure}$

Not all sounds, however, can be expressed by harmonic partials. Take for example the spectrum of the sound of a bell in figure 2.16. The partials are spaced at fractional multiples of the fundamental frequency. Nevertheless, these sounds can be represented by the sinusoidal partials and noise model, when the frequency of each partial is recorded, along with its amplitude and phase.

**Figure 2.16:** Spectrum of a bell sound (an inharmonic spectrum)
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/spectrum3.eps}} \end{figure}$

Most of the classical instruments and the voice have a purely harmonic spectrum. Some sounds, like the bell or metal plates have inharmonic spectra, which are mostly discribed as ``metallic''. Percussion sounds, finally, have only a feeble partial structure, and consist mainly of noise.

Of course, the character of the sound of an instrument is not determined by one spectrum only, but by its evolution in time. Especially, the changes of the proportions of the amplitudes of the partials, and little fluctuations of the partials over time allow us to recognise an instrument. Moreover, especially the attack phase (the first few milliseconds) of a sound contains essential information for the recognition of the instrument. For more about the acoustic correlates of the character of musical instruments see [vH54].

If we now turn to additive synthesis , we can easily generalize over harmonic or inharmonic sounds. All sounds are represented by a succession of time-frames consisting of a sinusoidal component with partials at frequencies f_i with amplitudes a_i and phases $\phi_i$ , i=1..n, and a residual noise component r(n) as a time-domain discrete signal. Synthesis for a frame at time t is done by evaluating

$\begin{displaymath} s(k) = r (k) + \sum_{i=1}^n {a_i \cos (2\pi f_i t + \phi_i}) \end{displaymath}$

**Figure 2.17:** Spectrum of a shakuhachi flute and partials found by additive analysis
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/shaku1.eps}} \end{figure}$

**Figure 2.18:** Spectrum of the resynthesized sinusoidal part of a shakuhachi flute
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/shaku2.eps}} \end{figure}$

**Figure 2.19:** Spectrum of the non-sinusoidal residual noise of a shakuhachi flute
$\begin{figure}\centerline{\epsfbox[114 282 540 513]{pics/shaku3.eps}} \end{figure}$

For example, the additive analysis of the sound of the japanese shakuhachi flute of figure 2.15 yields the harmonic partial peaks shown as crosses in figure 2.17. The additive re-synthesis according to equation (2.28) has the spectrum shown in figure 2.18. No non-sinusoidal components are present in this spectrum. The irregular bottom line is due to the unavoidable spectral smear of the FFT-window for displaying the spectrum, as demonstrated in section 2.1.5. Now, if we subtract the resynthesized signal from the original signal in time-domain(!), all that is left is the residual part r(n), consisting of non-sinusoidal noise, the spectrum of which is shown in figure 2.19. This signal contains the typical breath noise of shakuhachi playing. With other instruments, the residual signal reveals to us the clicking of the keys of a clarinet, the scratching of the bow of violin, the sound of the hammer of a piano hitting the string, without any trace of the vibration that it triggered.

It is now easy to manipulate the harmonic part, given that every single component of the sound is accessible. The two simplest but most effective manipulations being transposing the sound and changing its duration, both independent from each other. ^3.6 Various other manipulations are possible, such as selectively detuning the harmonics or a morphing of the partial structure with that of another sound.

Manipulating the residual part is not that obvious, but can be reasonably accomplished by considering it to be a random noise signal and controlling its spectral envelope (by filtering) over time.

For an in-depth discussion of additive analysis and synthesis, the idea of which was first published in [RM69], see [Rod97a].