Previous Contents Next

4   Audio

4.1   State of Affairs

The audio score following object suiviaudio is based on a Hidden Markov Model (HMM), as described in [OD01]. It uses the audio features log-energy and delta log-energy to distinguish rests from notes, and the energy in harmonic bands according to the note pitch, and its delta, as described in [OS01], to match the played notes to the expected notes. The energy in harmonic bands is also called PSM for peak structure match.

As described in section 2.1, the following works very well, but only monophonic scores could be parsed, and no ghost-states were implemented because not enough examples were available to determine the transition probabilities (i.e. all performences were presumed to be without errors).

4.2   What has been done

4.2.1   Cepstral Difference

The cepstral difference, or cepstral flux, cpd is defined as
cpd =
R
å
i=1
( ci - c'i )
2
 
 
    (1)
where R is the order of the cepstrum, here 12, c and c' are the vectors of the current and the previous cepstral coefficients, respectively, calculated from a window of the signal S by
c = IFFT ( log | FFT ( S ) | )     (2)
The term |FFT(S)| is already calculated for the PSM, so only the inverse FFT calculation is added.

4.3   What is to be done


Previous Contents Next