3 Software Development
This section focuses on the software engineering and programming topics common
to the audio and Midi score following objects, (section 4 and 6), and the
tools (section 7).
3.1 State of Affairs
The two objects suiviaudio and suivimidi were implemented separately for
jMax-2.5.3. Many bugs remained, especially in the score-parsing and
assignment of cues to notes in model generation, which led to occasional
crashes, hanging of the follower, and the last cue never being output, even if
recognised.
3.2 What has been done
The aforementioned bugs in the score parsing have been found. To parse fully
polyphonic scores, it had to be largely rewritten. The Hidden Markov Model is
built from the parsed score with one high-level state per change of polyphony
(see [Mat02]), while applying a quantisation to fuse close note
starts and ends, e.g. in chords.
The software architecture has been reorganised (see also section 8.4) to
factor out commonly used parts of suiviaudio and suivimidi. These are,
besides some auxiliary routines, the code that actually builds and calculates
the Hidden Markov Model: the score parsing and the decoding. This way, both
profit from the extension to polyphonic scores. Only the handling of the input
and the calculation of the observation likelyhoods stay specific to one type of
follower.
3.3 What is to be done
Unfortunately, the new score parsing algorithm is in the order of 5 times more
complicated than it would have to be. This artificial complexity entails some
hard to find new bugs that appeared only with highly polyphonic and very long
scores (of Pluton). They could be worked around in the current
release, but a rewrite of the score parsing is necessary as soon as it has to
be touched anyway, namely with the port to Max/MSP.
The factoring out of common parts introduces a sort of pseudo-inheritance into
the C-code and structures, using pointers to functions. However, this C++ like
architecture is not yet fully carried out and has to be made clearer (using
discipline and coding conventions, instead of the language support given by a
true object-oriented language).
Further on, a sketch of a new and more modular software-architecture is given
in section 8.4, which will greatly facilitate the extension to the new
type of spoken voice following, and the port to other run-time systems.