Working Group


Search by timbre similarity



Here we are presenting two internet applications which use the timbre descriptors search by similarity. The applications were programmed in PHP and MySQL and require the QuickTime Player plugin for playing the different sounds (.wav and .aiff). The user selects a target file by clicking on one of the soundfiles and queries for similar sounds. A list of all sounds is generated with increasing perceptual distance to the target sound.

Harmonic sounds

The URL of the search engine is: http://zappa.ircam.fr/cuidad/Harmonic.php3.

The engine operates on 351 sounds extracted from the "Studio OnLine" database (http://sol.ircam.fr) which comprise 16 instruments (flute, oboe, clarinet, bassoon, alto-sax, french-horn, trumpet, trombone, tuba, guitar, harp, violin, viola, cello, doublebass, accordion), with different playing techniques (e.g. sul ponticello, tremolo, glissando, pizzicato, etc.). The search by similarity is carried out using 5 descriptors (lat, hsc, hsd, hss and hsv; see table below).

 

Percussive sounds

The URL of the search engine is http://zappa.ircam.fr/cuidad/Percussion.php3.

The database is a collection of 328 sounds gathered from a public ftp site (ftp.futureftp.co.uk), from the "Allen Sides Microphone Cabinet CD-ROM" and from the SQUAM database. The sounds used belong to the following categories: conga, tom-tom, bongo, kick, snare, triangle, celesta, xylophone, timpani, marimba, clave, timbales, tambourine and glockenspiel. Sounds recorded with different dynamics and microphonic techniques were included. The search uses 3 descriptors (lat, sc and tc; see table below)

 

List of descriptors

Set of Elements

Functionality

Log-Attack Time (lat)

The lat is defined as the logarithm (decimal base) of the time the signal starts to the time it reaches its stable part.

Unit: [log sec]

Range: [-infty,determined by the length of the signal]

Harmonic Spectral Centroid (hsc)

The hsc is computed as the average over the sound segment duration of the instantaneous spectral centroïd (ihsc) within a running window. The ihsc is computed as the amplitude (linear scale) weighted mean of the harmonic peaks of the spectrum.

Unit: [Hz]

Range: [0,sr/2]

Harmonic Spectral Deviation (hsd)

The hsd is computed as the average over the sound segment duration of the instantaneous hsd (ihsd) within a running window. The ihsd is computed as the spectral deviation of amplitude (logarithmic scale) components from a global spectral envelope.

Unit: [-]

Range: [0,1]

Harmonic Spectral Std (hsstd)

The hsstd is computed as the average over the sound segment duration of the instantaneous hsstd (ihsstd) within a running window.

The ihsstd is computed as the amplitude (linear scale) weighted standard deviation of the harmonic peaks of the spectrum, normalized by the ihsc.

Units: [-]

Range: [0,sr/2]

Harmonic Spectral Variation (hsv)

The hsv is defined as the mean over the sound segment duration of the instantaneous hsv (ihsv).

The ihsv is defined as the normalized correlation between the amplitude (linear scale) of the harmonic peaks of two adjacent frames.

Units: [-]

Range: [0,1]

Spectral Centroid (sc)

The sc is the defined as the mean over the sound segment duration of the instantneous sc (isc).

The isc is computed as the power weighted mean of the frequency of the bins in the power spectrum.

Unit: [Hz]

Range: [0,sr/2]

Temporal Centroid (tc)

The tc is defined as the time averaged over the energy envelope.

Unit: [sec]

Range: [-infty,determined by the length of the signal]

Confidence

The confidence gives a measure of the quality of the estimation of each of the descriptors, and therefore also of the reliability of the results one can have by using the descriptors.

Units: [-]

Range: [0,1]