next up previous contents index
Next: 11.2 Architecture Up: 11. Envelope Viewing Application Previous: 11. Envelope Viewing Application

Subsections

   
11.1 Usage

This user documentation of the spectral envelope and signal viewing application VIEWENV is structured as follows: After explaining how to start the program in section 2.1, the basic entities the program handles, files and curves, are introduced in section 2.2. This section will also explain how to select what to display, and how to navigate in the data displayed. Section 2.3 will explain several other functions associated with the display window. Section 2.4 will explain the control window where the parameters for the various spectral envelope algorithms are specified. Each parameter will be described in detail. Section 2.5 will explain the manipulation window, section 2.6, finally, will explain the possibilities for interactive evaluation of the spectral envelope estimation methods.

   
11.1.1 Running VIEWENV

To start VIEWENV, you have to enter its directory (src/viewenv) and start MATLAB by typing matlab5. This will call the script file startup.m, which sets the correct search paths and starts the program by calling the function run. This function can also be called from the MATLAB prompt to reinitialize the program.

This should bring up several windows, the display window (figure 2.1) for display of data and general settings, the control window (figure 2.2) for loading data and setting parameters, and the manipulation window (figure 2.6) for manipulation of spectral envelopes.

If you're using MATLAB 5.0 or 5.1, you'll have to live with some bugs which are fixed in version 5.2: First, when more characters are entered in a text edit field than fit in the width of the field, the text becomes invisible. However, it is still there and can be used (just not seen), and when characters are deleted, it becomes visible again. Second, VIEWENV is programmed using the object-oriented programming features of MATLAB5. In the early versions, the object and class handling was not implemented correctly, leading to strange errors upon user actions, which would go away, when tried a second time. Just try again.

pics/ve-display.gif

pics/ve-control.gif

  
11.1.2 Files and Curves

VIEWENV can load files containing various types of data and display them directly, as well as computing data from loaded files and display that. Everything that can be displayed is called a curve. Every file, together with its derived curves is grouped into a rectangular frame in the control window (figure 2.2, see also figure 2.3 for a graphical presentation of the derivation relation). Every curve has a title which is displayed in bold in the control window. The different curves will be described later in section 2.4.

   
11.1.2.1 File names

File names  can be specified in three ways:

As a complete file path

The path (directory + filename) can be absolute or relative to the current directory.

As an extension to the base name

If the file name contains a ``%'' character, this will be replaced by the string entered in the Base name edittext in the display window. That way, the various files usually related by being generated from a single source can be accessed rapidly. (E.g. from some file sound.sf you'll have sound.format, sound.sdif, sound.synt.sf, etc.)

As a reference to an already loaded file

If the filename is of the form $n, where n is between 1 and 3, this file will share the data of the file n of the same type, i.e. from the frames in the control window left or right of the files frame. For example, if the name of Sound file 2 is specified as $1, pressing load (of sound file 2), will tell it to get its data from sound file 1, whatever will be loaded there, while the parameters stay independent. This allows to compare different settings of parameters for the algorithms on the same data set. See figure 2.3 for a graphical presentation of the reference relation.

   
11.1.2.2 Loading files

Files can be loaded in two ways: individually, by clicking the Load button underneath the file name edittext in the control window, or all files in a group called the load set  at once. All files whose box in load set is checked will be loaded when the Load set button next to the Base name in the display window is pressed. This combines nicely with the ``extension to the base name'' way of specifying file names, when all related files with the same base name are in the load set. Then, after changing the base name, the new files can be loaded with one mouse click.

=0.666

  
Figure 2.3: The derivation and reference relations in the control window
\begin{figure}\centerline{\epsfbox{pics/ve-relations.eps}} \end{figure}

   
11.1.2.3 Choosing the Display

In the lower left corner of the display window (figure 2.1), 7 checkboxes with a pop-up menu can be seen. These will be referred to as the display pop-ups. Initially, when no files are loaded, all the 7 pop-ups will only have one entry: nothing, so no curve will be displayed. Every curve that is present (be it data loaded from a file or calculated data) will be listed in each of the 7 pop-up menus. By selecting a desired curve in one of the pop-ups, it will be displayed in the display window. Thus, up to 7 curves can be viewed simultaneously.

The 7 checkboxes are similar to the mute function on audio mixing desks: When switched off, the curve selected in the pop-up will not be displayed. This serves for quick switching off of a curve, when it obscures some detail in the other curves displayed, without having to take the somewhat longer way of selecting nothing in the display pop-up. Also, different combinations of curves can be selected and configured quickly.

Consistent Colour Coding

An important point is the colour that is associated in a fixed and immutable way with each slot. Thus, choosing a slot for a curve also chooses the colour with which the curve will be displayed in the axes of the display window. This colour coding has been kept consistent across all the windows of the program. When a curve is selected in a pop-up, and thus the colour chosen, the heading of the curve in the control window (figure 2.2) is displayed in that colour, also. Also, the Spectrogram button in the display window, and the checkbox in the manipulatoin window (figure 2.6) will change to that colour, to indicate the curve being affected by their actions (see below).

   
11.1.2.4 Navigation

The navigation area in figure 2.1 shows the time-position in the data we're looking at, and offers controls to change it. The display is via three text fields, that give the position in seconds, samples, and frames. (Samples are only valid, if there is a sound file loaded.) All of these three fields can be edited, and new positions entered. Below the edit fields, there is a slider and four buttons to change the current frame. The slider can be dragged or set to a position by using the middle mouse button. Clicking on the slider arrows increments/decrements the frame by one. Clicking next to the slider handle increments/decrements by 10. The four buttons below serve the functions ``go to first frame'', ``go to previous frame'', ``go to next frame'', and ``go to last frame''.

How is the connection drawn between frames, samples and time positions? All loaded data have a time base (because time tags are stored with the file, or because it is clear how to convert position into time and vice versa), but only one data set can determine the relation between time and frame number, i.e. what time position to jump to, when a frame number is entered. This data set is called the time master. Whenever it changes, this is printed on the terminal. To tell which data set is the time master, priorities have been given. The data set with the highest priority will alsways be time master. The lowest priority assigned to the sound file and its derived curves. Then comes the envelope file, and the format file has highest priority. Within the maximally three data sets of one type, the one with the lowest number has highest priority.

When a time number is entered, the frame of the time master closest to that time is selected.

  
11.1.3 Other Functions in the Display Window

The Interaction Area

The interaction area above the navigation area in the display window figure 2.1 offers miscellaneous functions. There is the edit field for the base name (see 2.2.1), the Load set button (see 2.2.2) which have already been explained.

Print Preview

The Export button writes the content of the display axes to a black-and-white PostScript file export.eps that can be printed or included in paper documents such as the spectral envelope report. All curves lose their colours in the PostScript-file. Because this is not convenient when several curves are displayed, the Preview checkbox switches to the preview mode for generating black-and-white curves with different line styles (dashed, dotted, etc.) to distinguish them. As with the colour, each display pop-up has a fixed line style associated to it, so different combinations of line styles can be tried out. Also, in preview mode a legend appears, which shows the name of each displayed curve along with a short example of the line style or symbol of that curve.

Position and Formant Measurement

Below the Preview and Export elements, a checkbox for position measurement with the mouse is situated. When checked, clicking in the display axes shows the position in the frequency-amplitude plane in Hz and dB. Moreover, when clicking and dragging, a special tool for manual formant measurement in spectral envelopes is available, to retrieve the frequency, amplitude and bandwidth of a formant. It measures and displays the distance of the mouse from the point first clicked on (marked with a cross) in frequency and amplitude. For frequency, two symmetric vertical lines will appear on equal distance from the starting point, as shown in figure 2.4.

pics/ve-formant.gif

Formant measurement proceeds as follows: The starting point is chosen to be the peak of a formant apparent in the spectral envelope displayed. After clicking and holding down the left mouse button, the cursor is displaced to -3 dB amplitude distance. Then, the cursor is moved horizontally until the frequency lines cross the spectral envelope. The frequency distance displayed is the bandwidth of the formant. Moreover, because of the two symmetric frequency cursors, the symmetricity of the formant can be checked.

Zooming

When the mouse position checkbox is switched off (which is the default), zoom mode is selected. By clicking the left mouse button in the display axes, a zoom in of 50% is activated. The right mouse button zooms out. By dragging with the left mouse button, a zoom rectangle can be openend, and by double clicking the left mouse button, we return to the normal full view.

Spectrogram

Finally, a bit astray from the interaction area, the Spectrogram button at the very bottom to the left selects a spectrogram view (of the time-frequency plane, where intensity is coded by colour) of the curve that was manipulated last in the display popups.

     
11.1.4 Parameters and the Control Window

The control window (figure 2.2) contains three identical columns of parameters for the various algorithms, grouped in boxes by the file they are derived from. The values of almost all numerical parameters are entered and displayed either directly in the edittext next to their name, or graphically by using the slider below. As usual, the arrows allow an increment or decrement of one ``step'' (whatever a step is for that parameter), clicking next to the slider handle changes the value by a bigger amount, and the slider handle can be dragged directly to the desired position.

The settings made can be recorded for later use by saving the window. To do this, select the Save As entry from the File menu and enter or select control.m as the file name, overwriting the previous settings.

The curves and their respective parameters are explained in the following sections.

11.1.4.1 Sound File

Displays the Fourier spectrum of a sound file. Besides the standard file parameters, the length of the window, from which the spectrum will be computed, can be specified both in seconds and in number of samples 11.1.

LPC

Displays the spectral envelope computed by linear prediction (see section 3.2). The order parameter specifies the number of poles to use to approximate the spectrum. Higher orders yield a closer adaptation to the spectrum, but a less smooth spectral envelope.

Cepstrum

Displays the spectral envelope computed by the continuous cepstrum method (see section 3.3). The order parameter specifies the cutoff frequency of the smoothing filter. A higher order means that more high frequency components, i.e. rapid changes, are left in the spectral envelope.

11.1.4.2 Envelope File

Displays the spectral envelope taken directly from a file. Only the standard file parameters are present. The two possible formats of an envelope file are:

11.1.4.3 Format File

The format file itself contains the harmonic partials of a sound, as generated by the ADDITIVE program (see section 2.2) or HMM. Both, the ASCII format (extension .format), and the binary format (extension .fmt) are recognized, as well as files compressed with gzip (extensions .format.gz and .fmt.gz, respectively). Each partial is displayed as a little cross at its frequency and amplitude.

Discrete Cepstrum

Displays the spectral envelope computed by the discrete cepstrum method (see section 3.4) from the partials of the format file, which are displayed, too. The parameters are:

Algorithm
Choice between Galas, Galas cloud, algorithms developed by Thierry Galas, Regularized, Reg.+Border, Reg.+-Border, regularized discrete cepstrum, Cloud, Cloud+Border, Cloud+-Border, regularized discrete cepstrum with statistical cloud smoothing, where the +Border version adds points at the low and high border of the frequency range at half the amplitude of the highest/lowest partial to force the envelope going down, and the +-Border version adds these points only when there is enough space. For more details, see the description of the algorithms in section 3.5.4.

Frequency scale
Choice between
Linear:
The frequencies of the partials are used as is.

Log freq:
A logarithmic scale is applied to the frequencies of the partials higher than the break point to increase the resolution of low frequency details to the expense of high frequency details, which aren't perceptually important.

Log norm:
Use logarithmic scale as above, which is normalized (scaled back) to cover the range of 0 to fs/2 in order to avoid range errors.
Log corr:
Apply normalized logarithmic scale after adding points in the Cloud algorithm. For the other algorithms, this is the same as Log norm.

Break point
If a logarithmic frequency scale is selected, the first value specifies the break point between the linear part and the logarithmic part of the scale. The second value is the output frequency at the break frequency. For the normalized and corrected logarithmic scale, the second parameter has no influence (see section 3.5.3 for more details).

Mouse input
If checked, the Break point parameter can be entered interactively using the mouse. As can be seen in figure 2.5, a crosshair (the two intersecting dashed lines at the break input/output frequencies) is drawn, which can be moved by clicking the mouse or moving the mouse while a button is pressed (dragging). Also, the resulting partial frequencies are plotted as an input/output relationship (the y-axis is is to be interpreted as ranging from 0 to fs/2 Hertz, for that matter), along with a grey dotted identity line for reference.

The mouse input should always be active for only one of the three discrete cepstrum curves.

Order
The order parameter commands the accuracy with which the given partials are approximated by the spectral envelope. A high value will lead to a curve hitting the partials exactly, but is more demanding computationally.

Regularization
The regularization factor is a constraint on the shape of the spectral envelope. It introduces an additional punishment of irregularities due to to too strong an inclination of the curve (see section 3.5.1 for more details).

The values that can be entered range from 0 (no constraint) to 1 (heavy constraint), with intermediate values of the form $d \*
10^n$, where d = 0..9 and n = -5..-1. Clicking the slider arrows results in an increment/decrement of d, with a jump to the next higher/lower power of ten (e.g. 0.0009 will increment to 0.001, then 0.002). Clicking next to the slider handle multiplies/divides by ten. This way, a wide range of regularization factors with sufficient precision can be entered conveniently.

Show deviation
When this checkbox is on, the absolute deviation of the spectral envelope from the curve obtained by linear interpolation is displayed in dB. This is used for the evaluation of spectral envelope estimation as described in section 3.6 and shown in figure 3.13.

pics/ve-logscale.gif

   
11.1.5 The Manipulation Window

The manipulation window (figure 2.6) so far serves only one manipulation, the skewing. It is described in section 5.3. The four parameter of skewing can be entered numerically in the edit fields, or by sliders. With the Skew Range checkbox, skewing can be switched on or off. Always the last curve handled in the display pop-ups is the one to be skewed, which is indicated by the colour of the Skew Range checkbox. Figure 5.8 shows an example of the skewing manipulation.

pics/ve-manip.gif

   
11.1.6 The Evaluation Window

The evaluation window serves a rather specific task (to the spectral envelope project), and the user interface is therefore not very elaborate. The task is the evaluation of a large corpus of audio data, described in section 3.6. After the corpus has been analysed, the result file giving the maximum deviations for each frame can be loaded and displayed, as can be seen in figure 2.7. The upper line shows, for each frame, the maximum absolute deviation between the estimated spectral envelope and the linear interpolation. The lower line shows the amplitude at which the maximum deviation occured. The text display below the axes shows the frame number at which the cursor is positioned (frame), maximum deviation for that frame (maxdiff), amplitude value (env), and frequency (freq) at which the maximum difference occured. By means of a frequency cursor, and zooming into the interesting parts (toggle zoom mode/cursor mode by pressing 'z'), the peaks in the deviation curve can be examined (see figure 2.8). If it is a significant peak (if it is not at a very low amplitude or at the borders of the frequency range) we can jump directly to the display of the frame producing that error in the display window by pressing 'd'. By pressing 'f', these peaks are filtered out beforehand.

pics/ve-check.gif

pics/ve-checkzoom.gif


next up previous contents index
Next: 11.2 Architecture Up: 11. Envelope Viewing Application Previous: 11. Envelope Viewing Application
Diemo Schwarz
1998-09-07