The computer has extended musical writing both on the level of musical
structure and on the level of the sound control. Complex musical
ideas and structures can be formulated with the computer and its
results tested and manipulated with an unpreceeded ease. Sounds can
be produced that correspond exactly to a predescribed shape,
evolution, or spectral contents. However, the bulk of the music
applications still consider musical structure and sound synthesis as
two distinct fields within the realm of musical composition. Most
composition environments are focussed on ``l'écriture'' (musical
writing). To give the composer an idea of the composition a score can
be played using a MIDI instrument. However, if the composer wants to
integrate synthesized sounds into the composition, s/he falls back on
other applications for the synthesis. This has several
drawbacks. Firstly, if the user can integrate a description of the
sound synthesis in the composition, s/he cannot play them. Being able
to play synthesized sounds from within the composition environment is
desirable when the composer wants to integrate timbre in the
compositional process. In that case, data structures controlling the
sound synthesis are defined within the composition environment. In
particular, real-time playback is very helpful for the composer. In
Gareth Loy's words:
It is stifling to compose without being able to hear near-term
results. [...] In the case of sample-by-sample calculation of an
acoustical waveform, all issues of how music sounds - its musical
interpretation - must be resolved by the composer in addition to all
strictly compositional issues. The effect of not having real-time
feedback is like trying to play an instrument by touch only, and
having to complete the performance before being able to hear a single
note ... No possibility to reinject a human touch into non-real-time
synthesis ... [Loy89], page 331.
Secondly, in the transfer to the synthesis application the structure
of the composition is lost: it is reduced to an event list on a
one-dimensional time line. This limits the possibilities of
interactive pieces, since the context information is no longer
available. Lastly, the user has to copy the data to the synthesis
application. In the case of spectral data this may represent a
significant overhead and a large amount of memory space.
This still existing barrier between musical writing and sound
sculpting is artificial. Both are concerned with the description of
structure and data generation. There should not be fixed sounds on the
one hand, and structural organization on the other. The sounds should
be formed as a function of their place in a certain context. In
addition, the description of the sound synthesis may affect higher
levels of the organization. Integrating the two fields is necessary
if composers want to benefit fully from the possibilities offered by
computers to create multimedia pieces.
Another remark is that a lot of sound synthesis systems use a
primitive set of data types and only pass around constant values. Most
composition environments, however, are build on top of high-level
language interpreters and handle functions as ``first order
citizens.'' Maybe because of this fact, interactivity in real-time
systems is often reduced to setting the value of a controller. It is
not possible to re-program the synthesizer during runtime. The
distinction between editing and execution is very clear in real-time
applications. The usage of a synthesis systems proceeds in two
steps. In the first step the composition or synthesis patch is
loaded. In the second step the synthesis patch is executed. No editing
or inspection during runtime is possible.
If we ask the question how come composition environments and synthesis
systems take on such a different approach, we find only one
reason. And it is a technical one. Most composition-oriented
environments are written on top of a high-level language interpreter,
generally a Lisp-style language. These environments delegate all
storage reclamation issues to a garbage collector. Systems for sound
synthesis, especially those offering real-time performance, are
written in C or C++ for efficiency. They handle memory reclamation
explicitly. Integrating both functionalities means combining garbage
collection, real-time, and multi-threading in one system. In this
thesis we propose and discuss the implementation of such an integrated
environment. Part of this thesis will therefore focus on the real-time
versus garbage collection issue. In addition, we will design the
environment to make a seamless navigation between synthesis,
composition, and interactivity possible. In the design of the
integrated environment we must consider the following aspects
carefully:
- The choice of a language both for the implementation and
interaction with the user,
- The model that serves as a basis for the environment,
- The definition of the interfaces for basic components
such as synthesis modules, control functions, and sound
output,
- A rich model for the representation of time and the
organization of a piece.
On the one hand, the complexity of describing synthesis patches, music
structures, and the system's response to user input make the use of a
high-level language necessary. Even if we provide convenient tools
such as graphical editors, expert users request the possibility to
extend and customize the environment. On the other hand, it must be
possible to express signal processing techniques efficiently in the
language we choose to develop in. It must be possible to write a hard
real-time synthesizer. The choice of the language will determine both
the flexibility and the real-time capabilities of the system.
The conceptual model of our architecture must assure that
interactivity, sound synthesis, and composition can be combined in the
same environment. For example, music improvisation requires the
modification of multimedia material in runtime. We want the existing
tools for composition to be available for the interactive system. We
must therefore assure that the compositional structures are
``interaction-aware.'' Similarly, the data structures must be general
to include many possible synthesis techniques and compositional
paradigms and, at the same time, capture enough information to be
useful.
Instead of copying and converting complex control structures between
several systems, we want the data to be fully exchangeable between all
components of the environment. The same control structures that the
composer manipulates in the score are used during the synthesis. This
is important to allow the manipulation of timbre as a musical element
in a composition.
Some synthesizers have no notion of time at all (except for the notion
of now); others use a one-dimensional time line. To make
compositional structures more dynamic for use in interactive pieces, a
representation of the temporal information is needed during runtime.
In this chapter we discuss a new architecture for multimedia
systems. The architecture is designed to integrate various paradigms
of time-based media handling. In particular, it covers
synthesis/rendering, composition, and live interaction. We consider
the seamless integration of the three mentioned approaches as an
important achievement of the presented work. Our rationale behind the
necessity of this combination is that these different approaches
depend upon each other. Obviously, to perform a piece, it has to be
composed, but also, compositional structures guide interactive
systems. User intervention can also cause a (re-)organization of the
composition. To allow all possible schemes we need an integrated
environment. The system we propose in this text combines real-time
sound synthesis, an embedded Scheme interpreter, and interactive event
handling. The guideline in the design of our system is a fairly simple
model that we will expose in this chapter.
Several existing projects aim at the same goals as ours. Projects like
Formes [RC84], Nyquist [Dan93], and Foo
[EGA94] guide our work. PatchWork [AR93],
OpenMusic [Ass96], Csound [Ver86], and
Max/FTS [Puc91b,Puc91a] are milestones in the field
of computer music and greatly inspire us. Our proposal is inevitably
an integration of existing designs. However, it is made possible only
by nowadays program techniques and computing power. There are several
additional reasons why our model is different from existing music
systems. First, as stated above, it closely integrates three elements
rarely found together: composition, synthesis, and real-time. Second,
our model is very dynamic. The user is no longer restricted to a two
step process of programming the environment and executing the
synthesis program. Instead, we offer a fully dynamic and interactive
environment in which the user can create and modify data structures
during the execution. Thirdly, in comparison to some synthesis
programs, our model handles very rich data types, including functional
objects. This is true for the arguments carried by the events as well
as for the structures manipulated by the synthesis or composition
algorithms. Lastly, the underlying model does not explicitly mention
sound synthesis. It can be used for any time-dependent, rich media
that needs to be output at regular time intervals.
The rest of the thesis is organized as follows. First, we discuss why
we have chosen to develop the environment in the Java language and
argument the use of an embedded Scheme interpreter
(Section 4.1). Before we present the model formally
(Section 4.3), we introduce some of the concepts
used throughout the text (Section 4.2).
The discussion of the implementation is divided into two chapters.
Chapter 5 describes the basic components of the architecture and the major classes for the sound synthesis, control, and output.
Chapter 6 discusses the structures for the the organization of discrete elements and continuous functions in time.
Finally, in chapter 7, we estimate the real-time behavior of the system and discuss the influence of the garbage collector.