The computer has extended musical writing both on the level of musical structure and on the level of the sound control. Complex musical ideas and structures can be formulated with the computer and its results tested and manipulated with an unpreceeded ease. Sounds can be produced that correspond exactly to a predescribed shape, evolution, or spectral contents. However, the bulk of the music applications still consider musical structure and sound synthesis as two distinct fields within the realm of musical composition. Most composition environments are focussed on ``l'écriture'' (musical writing). To give the composer an idea of the composition a score can be played using a MIDI instrument. However, if the composer wants to integrate synthesized sounds into the composition, s/he falls back on other applications for the synthesis. This has several drawbacks. Firstly, if the user can integrate a description of the sound synthesis in the composition, s/he cannot play them. Being able to play synthesized sounds from within the composition environment is desirable when the composer wants to integrate timbre in the compositional process. In that case, data structures controlling the sound synthesis are defined within the composition environment. In particular, real-time playback is very helpful for the composer. In Gareth Loy's words:

It is stifling to compose without being able to hear near-term results. [...] In the case of sample-by-sample calculation of an acoustical waveform, all issues of how music sounds - its musical interpretation - must be resolved by the composer in addition to all strictly compositional issues. The effect of not having real-time feedback is like trying to play an instrument by touch only, and having to complete the performance before being able to hear a single note ... No possibility to reinject a human touch into non-real-time synthesis ... [Loy89], page 331.

Secondly, in the transfer to the synthesis application the structure of the composition is lost: it is reduced to an event list on a one-dimensional time line. This limits the possibilities of interactive pieces, since the context information is no longer available. Lastly, the user has to copy the data to the synthesis application. In the case of spectral data this may represent a significant overhead and a large amount of memory space.

This still existing barrier between musical writing and sound sculpting is artificial. Both are concerned with the description of structure and data generation. There should not be fixed sounds on the one hand, and structural organization on the other. The sounds should be formed as a function of their place in a certain context. In addition, the description of the sound synthesis may affect higher levels of the organization. Integrating the two fields is necessary if composers want to benefit fully from the possibilities offered by computers to create multimedia pieces.

Another remark is that a lot of sound synthesis systems use a primitive set of data types and only pass around constant values. Most composition environments, however, are build on top of high-level language interpreters and handle functions as ``first order citizens.'' Maybe because of this fact, interactivity in real-time systems is often reduced to setting the value of a controller. It is not possible to re-program the synthesizer during runtime. The distinction between editing and execution is very clear in real-time applications. The usage of a synthesis systems proceeds in two steps. In the first step the composition or synthesis patch is loaded. In the second step the synthesis patch is executed. No editing or inspection during runtime is possible.

If we ask the question how come composition environments and synthesis systems take on such a different approach, we find only one reason. And it is a technical one. Most composition-oriented environments are written on top of a high-level language interpreter, generally a Lisp-style language. These environments delegate all storage reclamation issues to a garbage collector. Systems for sound synthesis, especially those offering real-time performance, are written in C or C++ for efficiency. They handle memory reclamation explicitly. Integrating both functionalities means combining garbage collection, real-time, and multi-threading in one system. In this thesis we propose and discuss the implementation of such an integrated environment. Part of this thesis will therefore focus on the real-time versus garbage collection issue. In addition, we will design the environment to make a seamless navigation between synthesis, composition, and interactivity possible. In the design of the integrated environment we must consider the following aspects carefully:

The choice of a language both for the implementation and interaction with the user,
The model that serves as a basis for the environment,
The definition of the interfaces for basic components such as synthesis modules, control functions, and sound output,
A rich model for the representation of time and the organization of a piece.

On the one hand, the complexity of describing synthesis patches, music structures, and the system's response to user input make the use of a high-level language necessary. Even if we provide convenient tools such as graphical editors, expert users request the possibility to extend and customize the environment. On the other hand, it must be possible to express signal processing techniques efficiently in the language we choose to develop in. It must be possible to write a hard real-time synthesizer. The choice of the language will determine both the flexibility and the real-time capabilities of the system.

The conceptual model of our architecture must assure that interactivity, sound synthesis, and composition can be combined in the same environment. For example, music improvisation requires the modification of multimedia material in runtime. We want the existing tools for composition to be available for the interactive system. We must therefore assure that the compositional structures are ``interaction-aware.'' Similarly, the data structures must be general to include many possible synthesis techniques and compositional paradigms and, at the same time, capture enough information to be useful.

Instead of copying and converting complex control structures between several systems, we want the data to be fully exchangeable between all components of the environment. The same control structures that the composer manipulates in the score are used during the synthesis. This is important to allow the manipulation of timbre as a musical element in a composition.

Some synthesizers have no notion of time at all (except for the notion of now); others use a one-dimensional time line. To make compositional structures more dynamic for use in interactive pieces, a representation of the temporal information is needed during runtime.

In this chapter we discuss a new architecture for multimedia systems. The architecture is designed to integrate various paradigms of time-based media handling. In particular, it covers synthesis/rendering, composition, and live interaction. We consider the seamless integration of the three mentioned approaches as an important achievement of the presented work. Our rationale behind the necessity of this combination is that these different approaches depend upon each other. Obviously, to perform a piece, it has to be composed, but also, compositional structures guide interactive systems. User intervention can also cause a (re-)organization of the composition. To allow all possible schemes we need an integrated environment. The system we propose in this text combines real-time sound synthesis, an embedded Scheme interpreter, and interactive event handling. The guideline in the design of our system is a fairly simple model that we will expose in this chapter.

Several existing projects aim at the same goals as ours. Projects like Formes [RC84], Nyquist [Dan93], and Foo [EGA94] guide our work. PatchWork [AR93], OpenMusic [Ass96], Csound [Ver86], and Max/FTS [Puc91b,Puc91a] are milestones in the field of computer music and greatly inspire us. Our proposal is inevitably an integration of existing designs. However, it is made possible only by nowadays program techniques and computing power. There are several additional reasons why our model is different from existing music systems. First, as stated above, it closely integrates three elements rarely found together: composition, synthesis, and real-time. Second, our model is very dynamic. The user is no longer restricted to a two step process of programming the environment and executing the synthesis program. Instead, we offer a fully dynamic and interactive environment in which the user can create and modify data structures during the execution. Thirdly, in comparison to some synthesis programs, our model handles very rich data types, including functional objects. This is true for the arguments carried by the events as well as for the structures manipulated by the synthesis or composition algorithms. Lastly, the underlying model does not explicitly mention sound synthesis. It can be used for any time-dependent, rich media that needs to be output at regular time intervals.

The rest of the thesis is organized as follows. First, we discuss why we have chosen to develop the environment in the Java language and argument the use of an embedded Scheme interpreter (Section 4.1). Before we present the model formally (Section 4.3), we introduce some of the concepts used throughout the text (Section 4.2).

The discussion of the implementation is divided into two chapters. Chapter 5 describes the basic components of the architecture and the major classes for the sound synthesis, control, and output. Chapter 6 discusses the structures for the the organization of discrete elements and continuous functions in time. Finally, in chapter 7, we estimate the real-time behavior of the system and discuss the influence of the garbage collector.

4.1 The choice of Java and an embedded Scheme interpreter
4.2 Synthesis processes, events, and programs
4.3 A formal description of the architecture