This user documentation of the spectral envelope and signal viewing application VIEWENV is structured as follows: After explaining how to start the program in section 2.1, the basic entities the program handles, files and curves, are introduced in section 2.2. This section will also explain how to select what to display, and how to navigate in the data displayed. Section 2.3 will explain several other functions associated with the display window. Section 2.4 will explain the control window where the parameters for the various spectral envelope algorithms are specified. Each parameter will be described in detail. Section 2.5 will explain the manipulation window, section 2.6, finally, will explain the possibilities for interactive evaluation of the spectral envelope estimation methods.
To start VIEWENV, you have to enter its directory (src/viewenv) and start MATLAB by typing matlab5. This will call the script file startup.m, which sets the correct search paths and starts the program by calling the function run. This function can also be called from the MATLAB prompt to reinitialize the program.
This should bring up several windows, the display window (figure 2.1) for display of data and general settings, the control window (figure 2.2) for loading data and setting parameters, and the manipulation window (figure 2.6) for manipulation of spectral envelopes.
If you're using MATLAB 5.0 or 5.1, you'll have to live with some bugs which are fixed in version 5.2: First, when more characters are entered in a text edit field than fit in the width of the field, the text becomes invisible. However, it is still there and can be used (just not seen), and when characters are deleted, it becomes visible again. Second, VIEWENV is programmed using the object-oriented programming features of MATLAB5. In the early versions, the object and class handling was not implemented correctly, leading to strange errors upon user actions, which would go away, when tried a second time. Just try again.
VIEWENV can load files containing various types of data and display them directly, as well as computing data from loaded files and display that. Everything that can be displayed is called a curve. Every file, together with its derived curves is grouped into a rectangular frame in the control window (figure 2.2, see also figure 2.3 for a graphical presentation of the derivation relation). Every curve has a title which is displayed in bold in the control window. The different curves will be described later in section 2.4.
File names can be specified in three ways:
Files can be loaded in two ways: individually, by clicking the Load button underneath the file name edittext in the control window, or all files in a group called the load set at once. All files whose box in load set is checked will be loaded when the Load set button next to the Base name in the display window is pressed. This combines nicely with the ``extension to the base name'' way of specifying file names, when all related files with the same base name are in the load set. Then, after changing the base name, the new files can be loaded with one mouse click.
=0.666
In the lower left corner of the display window (figure 2.1), 7 checkboxes with a pop-up menu can be seen. These will be referred to as the display pop-ups. Initially, when no files are loaded, all the 7 pop-ups will only have one entry: nothing, so no curve will be displayed. Every curve that is present (be it data loaded from a file or calculated data) will be listed in each of the 7 pop-up menus. By selecting a desired curve in one of the pop-ups, it will be displayed in the display window. Thus, up to 7 curves can be viewed simultaneously.
The 7 checkboxes are similar to the mute function on audio mixing desks: When switched off, the curve selected in the pop-up will not be displayed. This serves for quick switching off of a curve, when it obscures some detail in the other curves displayed, without having to take the somewhat longer way of selecting nothing in the display pop-up. Also, different combinations of curves can be selected and configured quickly.
An important point is the colour that is associated in a fixed and immutable way with each slot. Thus, choosing a slot for a curve also chooses the colour with which the curve will be displayed in the axes of the display window. This colour coding has been kept consistent across all the windows of the program. When a curve is selected in a pop-up, and thus the colour chosen, the heading of the curve in the control window (figure 2.2) is displayed in that colour, also. Also, the Spectrogram button in the display window, and the checkbox in the manipulatoin window (figure 2.6) will change to that colour, to indicate the curve being affected by their actions (see below).
The navigation area in figure 2.1 shows the time-position in the data we're looking at, and offers controls to change it. The display is via three text fields, that give the position in seconds, samples, and frames. (Samples are only valid, if there is a sound file loaded.) All of these three fields can be edited, and new positions entered. Below the edit fields, there is a slider and four buttons to change the current frame. The slider can be dragged or set to a position by using the middle mouse button. Clicking on the slider arrows increments/decrements the frame by one. Clicking next to the slider handle increments/decrements by 10. The four buttons below serve the functions ``go to first frame'', ``go to previous frame'', ``go to next frame'', and ``go to last frame''.
How is the connection drawn between frames, samples and time positions? All loaded data have a time base (because time tags are stored with the file, or because it is clear how to convert position into time and vice versa), but only one data set can determine the relation between time and frame number, i.e. what time position to jump to, when a frame number is entered. This data set is called the time master. Whenever it changes, this is printed on the terminal. To tell which data set is the time master, priorities have been given. The data set with the highest priority will alsways be time master. The lowest priority assigned to the sound file and its derived curves. Then comes the envelope file, and the format file has highest priority. Within the maximally three data sets of one type, the one with the lowest number has highest priority.
When a time number is entered, the frame of the time master closest to that time is selected.
Formant measurement proceeds as follows: The starting point is chosen to be the peak of a formant apparent in the spectral envelope displayed. After clicking and holding down the left mouse button, the cursor is displaced to -3 dB amplitude distance. Then, the cursor is moved horizontally until the frequency lines cross the spectral envelope. The frequency distance displayed is the bandwidth of the formant. Moreover, because of the two symmetric frequency cursors, the symmetricity of the formant can be checked.
The control window (figure 2.2) contains three identical columns of parameters for the various algorithms, grouped in boxes by the file they are derived from. The values of almost all numerical parameters are entered and displayed either directly in the edittext next to their name, or graphically by using the slider below. As usual, the arrows allow an increment or decrement of one ``step'' (whatever a step is for that parameter), clicking next to the slider handle changes the value by a bigger amount, and the slider handle can be dragged directly to the desired position.
The settings made can be recorded for later use by saving the window. To do this, select the Save As entry from the File menu and enter or select control.m as the file name, overwriting the previous settings.
The curves and their respective parameters are explained in the following sections.
Displays the Fourier spectrum of a sound file. Besides the standard file parameters, the length of the window, from which the spectrum will be computed, can be specified both in seconds and in number of samples 11.1.
Displays the spectral envelope taken directly from a file. Only the standard file parameters are present. The two possible formats of an envelope file are:
The format file itself contains the harmonic partials of a sound, as generated by the ADDITIVE program (see section 2.2) or HMM. Both, the ASCII format (extension .format), and the binary format (extension .fmt) are recognized, as well as files compressed with gzip (extensions .format.gz and .fmt.gz, respectively). Each partial is displayed as a little cross at its frequency and amplitude.
The mouse input should always be active for only one of the three discrete cepstrum curves.
The values that can be entered range from 0 (no constraint) to 1 (heavy constraint), with intermediate values of the form , where d = 0..9 and n = -5..-1. Clicking the slider arrows results in an increment/decrement of d, with a jump to the next higher/lower power of ten (e.g. 0.0009 will increment to 0.001, then 0.002). Clicking next to the slider handle multiplies/divides by ten. This way, a wide range of regularization factors with sufficient precision can be entered conveniently.
The manipulation window (figure 2.6) so far serves only one manipulation, the skewing. It is described in section 5.3. The four parameter of skewing can be entered numerically in the edit fields, or by sliders. With the Skew Range checkbox, skewing can be switched on or off. Always the last curve handled in the display pop-ups is the one to be skewed, which is indicated by the colour of the Skew Range checkbox. Figure 5.8 shows an example of the skewing manipulation.
The evaluation window serves a rather specific task (to the spectral envelope project), and the user interface is therefore not very elaborate. The task is the evaluation of a large corpus of audio data, described in section 3.6. After the corpus has been analysed, the result file giving the maximum deviations for each frame can be loaded and displayed, as can be seen in figure 2.7. The upper line shows, for each frame, the maximum absolute deviation between the estimated spectral envelope and the linear interpolation. The lower line shows the amplitude at which the maximum deviation occured. The text display below the axes shows the frame number at which the cursor is positioned (frame), maximum deviation for that frame (maxdiff), amplitude value (env), and frequency (freq) at which the maximum difference occured. By means of a frequency cursor, and zooming into the interesting parts (toggle zoom mode/cursor mode by pressing 'z'), the peaks in the deviation curve can be examined (see figure 2.8). If it is a significant peak (if it is not at a very low amplitude or at the borders of the frequency range) we can jump directly to the display of the frame producing that error in the display window by pressing 'd'. By pressing 'f', these peaks are filtered out beforehand.