Next: 9. Applications Up: 8. Implementation Previous: 8.4 The Development of

Subsections

8.5 The Spectral Envelope File Format

The file representation of spectral envelopes is based on SDIF (Sound Data Interchange Format ) [Vir98]. This section gives a brief overview of the SDIF format, followed by a presentation of the types that will be added to the format for spectral envelopes. The final definition is subject to change because it is still being discussed with CNMAT, IRCAM's partner in the SDIF project.

An SDIF-file is organised in chunks . It starts with an opening chunk containing the header, followed by some optional chunks giving file-global user-defined data in field-name/value pairs. These are called Name-Value Tables (NVTs). The body of the file is a contiguous sequence of time-tagged frames (which are themselves chunks), sorted in ascending temporal order, with multiple kinds of frames allowed in a single file or stream. A library of standard frame types defines formats for storing common sound representations that are part of the SDIF standard. The data in a frame are stored in matrices (or vectors) of floating point numbers, with each column corresponding to a parameter like frequency or amplitude and each row representing an object like a filter, sinusoid, or noise band.

SDIF allows the definition of new frame and matrix types, even in a file. For this end, a small data-definition language has been defined. The new types necessary for spectral envelopes are now given in this format, for each of the representations.

Spectral (Sampled) Envelopes

The following statements define the matrix and frame types for the samples spectral envelope representation (see section 4.3):

        1MTD 1ENV { amplitude }
        1MTD 1ENI { SamplingRate, FrequencyScale, FrequencyScaleParameter }
        1FTD 1ENV { 
                    1ENI envelope-info;
                    1ENV envelope; 
                  }

The codes 1MTD defines two matrix types. The first, 1ENV, holds one column and n rows of amplitude values v_ifor the bins of the spectral envelope at frequency i / n * f_s / 2 (for the linear case). The second, 1ENI, (where ENI stands for envelope information), holds the sampling rate, frequency scale (linear or logarithmic), and the break point parameter for the case of logarithmic frequency scale.

1FTD defines a frame type also called 1ENV, which contains one matrix of type 1ENV and one of type 1ENI. The frame and matrix types have distinct name spaces and can bear the same name. The leading 1 is a version indicator. When a frame and matrix types is expanded by more data items, the version indicator will augment.

These values in 1ENI are also given in the name-value table at the beginning of the file, but in order to keep the file applicable for streaming (where the header might have been missed by a reader), they are repeated with each frame.

Filter Coefficients

The following statements define the matrix and frame types for the LPC and cepstrum filter coefficients (see section 4.2):

        1MTD 1ARA { a }
        1FTD 1ARA {
                    1ENI envelope-info;
                    1ARA AR-coefs;
                  }

        1MTD 1ARK { k }
        1FTD 1ARK {
                    1ENI envelope-info;
                    1ARK AR-coefs;
                  }

        1MTD 1CEC { c }
        1FTD 1CEC {
                    1ENI envelope-info;
                    1CEC cepstral-coefs;
                  }

The number n of rows of each coefficient matrix 1ARA, 1ARK, or 1CEC is the order of the LPC, or cepstrum.

Alternative Definition

At the moment, the method described above has been adopted. Nevertheless, it has the disadvantage that defining the 4 frame/matrix types above clutters the SDIF name space with structurally identical and semantically very similar types. Alternatively, one could imagine a single frame type with an info matrix and a data matrix, the info matrix giving the type of the data and all parameters necessary to interpret them:

        1MTD 1FIF { CoefficientType, SamplingRate, 
                    FrequencyScale,  FrequencyScaleParameter }
        1MTD 1FCF { coefficients }
        1FTD 1FCF {
                    1FIF filter-info;
                    1FCF coefficients; 
                  }

The disadvantage of this method is that the actual type of the data is hidden in a floating point field, even for something as unambiguous as LPC coefficients. What's more, this method repeats the typing mechanism of SDIF, but losing the clarity of the 4-letter type signatures.

Formant Description

For the representation of a spectral envelope as formants (see section 4.5), we need two types: precise formants and fuzzy regions, where a formant is suspected. The precise formants would be defined by:

        1MTD 1FRM { frequency, bandwidth, amplitude, label }
        1FTD 1FRM { 1FRM formants; }

The fuzzy formants would be a combination of an envelope 1ENV and a formant region 1FRR:

        1MTD 1FRR { lowerfrequency, upperfrequency, centerfrequency,
                    confidence, label} 
        1FTD 1FRR { 
                    1ENI envelope-info;
                    1ENV envelope;
                    1FRR formant-regions; 
                  }

If the center frequency is not known, the column could be missing or a value of $-\infty$ could be given. The labels can be used to identify and track formants, necessary for interpolation, the confidence parameter gives a hint of the how sure it is that there is a formant in that region.

Next: 9. Applications Up: 8. Implementation Previous: 8.4 The Development of

Diemo Schwarz
1998-09-07