Ircam - Centre Georges-Pompidou Equipe Analyse-Synthese

Model of high level structuring and intelligent content-based retrieval in audio databases


X. Rodet et S. Dubnov

For short time description of the sound we use a of spectral envelopes , very much like in speech, which allow for up to 90% of data reduction in sound representat ion. Moreover, a vector quantisation (VQ) procedure further reduces the set of envelopes by optimally representing the complete dataset with just a few typical ones. In order to capture the transitional spectral characteristics, we use cepstral derivative as an additional feature. A first order difference between neighbouring cepstra is used as an esti mate for the derivative at a given time instant. In order to capture the information present in higher cepstral coefficients as well, additional parameters are under consideration. One must note that the higher cepstral coefficients correspond to the excitation signal (also called the residual). Variations in the fundamental frequency are important for characteris ation of the transitional spectra and should be considered in correlation to the cepstral derivative. HOS parameters describe the residual properties i n terms of non-linear properties of the sound and are related to phase couplin g in harmonic signals and texture properties in noise signals. Cepstral est imation of HOS will be in the domain of our future research. Once the raw sound signal is converted into symbolic representation by VQ procedure, we apply the notion of entropy to compare sequences in terms of similarity between their statistical sources.










Page remise a jour le lundi 2 Mars 1998
Xavier Rodet
IRCAM
rod@ircam.fr