Previous Contents Next

3   Software Development

This section focuses on the software engineering and programming topics common to the audio and Midi score following objects, (section 4 and 6), and the tools (section 7).

3.1   State of Affairs

The two objects suiviaudio and suivimidi were implemented separately for jMax-2.5.3. Many bugs remained, especially in the score-parsing and assignment of cues to notes in model generation, which led to occasional crashes, hanging of the follower, and the last cue never being output, even if recognised.

3.2   What has been done

The aforementioned bugs in the score parsing have been found. To parse fully polyphonic scores, it had to be largely rewritten. The Hidden Markov Model is built from the parsed score with one high-level state per change of polyphony (see [Mat02]), while applying a quantisation to fuse close note starts and ends, e.g. in chords.

The software architecture has been reorganised (see also section 8.4) to factor out commonly used parts of suiviaudio and suivimidi. These are, besides some auxiliary routines, the code that actually builds and calculates the Hidden Markov Model: the score parsing and the decoding. This way, both profit from the extension to polyphonic scores. Only the handling of the input and the calculation of the observation likelyhoods stay specific to one type of follower.

3.3   What is to be done

Unfortunately, the new score parsing algorithm is in the order of 5 times more complicated than it would have to be. This artificial complexity entails some hard to find new bugs that appeared only with highly polyphonic and very long scores (of Pluton). They could be worked around in the current release, but a rewrite of the score parsing is necessary as soon as it has to be touched anyway, namely with the port to Max/MSP.

The factoring out of common parts introduces a sort of pseudo-inheritance into the C-code and structures, using pointers to functions. However, this C++ like architecture is not yet fully carried out and has to be made clearer (using discipline and coding conventions, instead of the language support given by a true object-oriented language).

Further on, a sketch of a new and more modular software-architecture is given in section 8.4, which will greatly facilitate the extension to the new type of spoken voice following, and the port to other run-time systems.


Previous Contents Next