Unit Selection Sound Synthesis:
Extended Bibliography

Diemo Schwarz (schwarz@ircam.fr)

Last update: June 28, 2000

This document exists as one big file (good for searching), or as several pages (good for browsing). Click on the entry type, for example MISC, for the original BibTeX entry.

1   ASP Anthropic Signal Processing Group

MISCasp:www [ASP99]
KeyASP
TitleAnthropic Signal Processing Group, Oregon Graduate Institute of Science and Technology
HowpublishedWWW page
Year1999
urlhttp://ece.ogi.edu/asp
pub-urlhttp://ece.ogi.edu/asp/publicat.html
Notehttp://ece.ogi.edu/asp


INPROC.asp:plp85 [HHW85]
Author
H. Hermansky, B. A. Hanson, H. Wakita
TitlePerceptually based linear predictive analysis of speech
BooktitleProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Year1985
Pages509--512


INPROC.nlp:tsdproc213-218 [Her98]
TitleData-Driven Speech Analysis For ASR
Pages213--218
Author
Hynek Hermansky
BooktitleProceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98
Year1998
Editor
Petr Sojka, Václav Matousek, Karel Pala, Ivan Kopecek
AddressBrno, Czech Republic
MonthSeptember
PublisherMasaryk University Press


2   AT&T Labs

MISCatt:www [ATT99]
KeyATT
TitleAT&T Labs, Oregon Graduate Institute of Science and Technology
HowpublishedWWW page
Year1999
urlhttp://www.research.att.com/projects/tts/
Notehttp://www.research.att.com/projects/tts/


INPROC.att:nextgen99 [BCS+99]
Author
M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, A. Syrdal
TitleThe AT&T Next-Gen TTS System
BooktitleJoint Meeting of ASA, EAA, and DAGA
AddressBerlin, Germany
MonthMarch
Year1999
Notewww [ATT99]
AbstractThe new AT&T Text-To-Speech (TTS) system for general U.S. English text is based on best-choice components of the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR's CHATR system. From Flextalk, it employs text normalization, letter-to-sound, and prosody generation. Festival provides a flexible and modular architecture for easy experimentation and competitive evaluation of different algorithms or modules. In addition, we adopted CHATR's unit selection algorithms and modified them in an attempt to guarantee high intelligibility under all circumstances. Finally, we have added our own Harmonic plus Noise Model (HNM) backend for synthesizing the output speech. Most decisions made during the research and development phase of this system were based on formal subjective evaluations. We feel that the new system goes a long way toward delivering on the long-standing promise of truly natural-sounding, as well as highly intelligible, synthesis.


INPROC.att:diph-select98 [BCS98]
Author
Mark Beutnagel, Alistair Conkie, Ann K. Syrdal
TitleDiphone Synthesis using Unit Selection
BooktitleThe 3rd ESCA/COCOSDA Workshop on Speech Synthesis
AddressJenolan Caves, Australia
MonthNovember
Year1998
Notewww [ATT99]
RemarksSummary: CHATR unit selection (using phone units) extended to diphones. Open synthesis backend: PSOLA, HNM, wave concat. Uses standard Festival. Careful listening test examining influence on quality of synthesis/unit type/pruning. Base for Next-Gen TTS [BCS+99]?
AbstractThis paper describes an experimental AT&T concatenative synthesis system using unit selection, for which the basic synthesis units are diphones. The synthesizer may use any of the data from a large database of utterances. Since there are in general multiple instances of each concatenative unit, the system performs dynamic unit selection. Selection among candidates is done dynamically at synthesis, in a manner that is based on and extends unit selection implemented in the CHATR synthesis system [1][4]. Selected units may be either phones or diphones, and they can be synthesized by a variety of methods, including PSOLA [5], HNM [11], and simple unit concatenation. The AT&T system, with CHATR unit selection, was implemented within the framework of the Festival Speech Synthesis System [2]. The voice database amounted to approximately one and one-half hours of speech and was constructed from read text taken from three sources. The first source was a portion of the 1989 Wall Street Journal material from the Penn Treebank Project, so that the most frequent diphones were well represented. Complete diphone converage was assured by the second text, which was designed for diphone databases [12]. A third set of data consisted of recorded prompts for telephone service applications. Subjective formal listening tests were conducted to compare speech quality for several options that exist in the AT&T synthesizer, including synthesis methods and choices of fundamental units. These tests showed that unit selection techniques can be successfully applied to diphone synthesis.


INPROC.att:HNM98 [Sty98a]
Author
Yannis Stylianou
TitleConcatenative Speech Synthesis using a Harmonic plus Noise Model
BooktitleThe 3rd ESCA/COCOSDA Workshop on Speech Synthesis
AddressJenolan Caves, Australia
MonthNovember
Year1998
Notewww [ATT99]
AbstractThis paper describes the application of the Harmonic plus Noise Model, HNM, for concatenative Text-to-Speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of speech signal in these two components allows for more natural-sounding modifications (e.g., source and filter modifications) of the signal. The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness and pleasantness.


INPROC.att:ph98 [Sty98b]
Author
Yannis Stylianou
TitleRemoving Phase Mismatches in Concatenative Speech Synthesis
BooktitleThe 3rd ESCA/COCOSDA Workshop on Speech Synthesis
AddressJenolan Caves, Australia
MonthNovember
Year1998
Notewww [ATT99]
AbstractConcatenation of acoustic units is widely used in most of the currently available text-to-speech systems. While this approach leads to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded in a different order. One important issue in concatenation is that of synchronization of speech frames or, in other words, inter-frame coherence. This paper presents a novel method for synchronization of signals with applications to speech synthesis. The method is based on the notion of center of gravity applied to speech signals. It is an off-line approach as this can be done during analysis with no computational burden on synthesis. The method has been tested with the Harmonic plus Noise Model, HNM, on many large speech databases. The resulting synthetic speech is free of phase mismatch (inter-frame incoherence) problems.


INPROC.att:Yang98 [YS98]
Author
Ping-Fai Yang, Yannis Stylianou
TitleReal Time Voice Alteration Based on Linear Prediction
Year1998
BooktitleProc. ICSLP98
Notewww [ATT99]


INPROC.att:Syrdal98 [SCS98]
Author
Ann K. Syrdal, Alistair Conkie, Yannis Stylianou
TitleExploration of Acoustic Correlates in Speaker Selection for Concatenative Synthesis
Year1998
BooktitleProc. ICSLP98
Notewww [ATT99]


INPROC.att:Ostermann98 [OBFW98]
Author
Jörn Ostermann, Mark C. Beutnagel, Ariel Fischer, Yao Wang
TitleIntegration Of Talking Heads And Text-To-Speech Synthesizers For Visual TTS
Year1998
BooktitleProc. ICSLP98
Notewww [ATT99]


INPROC.att:paperSYN98 [SSG+98]
Author
Ann K Syrdal, Yannis G Stylianou, Laurie F Garrison, Alistair Conkie, Juergen Schroeter
TitleTD-PSOLA versus Harmonic Plus Noise Model in Diphone Based Speech Synthesis
Year1998
BooktitleProc. ICASSP98
Pages273--276
Notewww [ATT99]
AbstractIn an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the Harmonic plus Noise Model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA.


3   CNMAT Center for New Music and Audio Technologies

INPROC.cnmat:sdif98 [WCF+98]
Author
Matthew Wright, Amar Chaudhary, Adrian Freed, David Wessel, Xavier Rodet, Dominique Virolle, Rolf Woehrmann, Xavier Serra
TitleNew Applications of the Sound Description Interchange Format
BooktitleProceedings of the International Computer Music Conference
Year1998


INPROC.cnmat:sdif98-short [W+98]
Author
M. Wright, others
TitleNew Applications of the Sound Description Interchange Format
BooktitleProc. ICMC
Year1998


INPROC.cnmat:sdif99 [WCF+99b]
Author
Matthew Wright, Amar Chaudhary, Adrian Freed, Sami Khoury, David Wessel
TitleAudio Applications of the Sound Description Interchange Format Standard
BooktitleAES 107th convention preprint
Year1999


INPROC.cnmat:sdif99-short [WCF+99a]
Author
M. Wright, A. Chaudhary, A. Freed, S. Khoury, D. Wessel
TitleAudio Applications of the Sound Description Interchange Format Standard
BooktitleAES 107th convention
Year1999


INPROC.cnmat:sdif99-sshort [W+99]
Author
M. Wright, others
TitleAudio Applications of the Sound Description Interchange Format Standard
BooktitleAES 107th convention
Year1999


INPROC.cnmat:sdif-mpeg4 [WS99b]
Author
Matthew Wright, Eric D. Scheirer
TitleCross-Coding SDIF into MPEG-4 Structured Audio
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year1999
AddressBeijing
MonthOctober
urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/saol+sdif/icmc99-saol+sdif.html
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+mpeg4.html
bib-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999
AbstractWith the completion of the MPEG-4 international standard in October 1998, considerable industry and academic resources will be devoted to building implementations of the MPEG-4 Structured Audio tools. Among these tools is the Structured Audio Orchestra Language (``SAOL''), a general-purpose sound processing and synthesis language. The standardization of MPEG-4 and SAOL is an important development for the computer music community, because compositions written in SAOL will be able to be synthesized by any compliant MPEG-4 decoder. At the same time, the sound analysis and synthesis community has developed and embraced the Sound Description Interface Format (``SDIF''), a general-purpose framework for representing various high-level sound descriptions such as sum-of-sinusoids, noise bands, time-domain samples, and formants. Many tools for composing and manipulating sound in the SDIF format have been created. Composers, sound designers, and analysis/synthesis researchers can benefit from the combined strengths of MPEG-4 and SDIF by using the MPEG-4 Structured Audio decoder as an SDIF synthesizer. This allows the use of sophisticated SDIF tools to create musical works, while leveraging the anticipated wide penetration of MPEG-4 playback devices. Cross-coding SDIF into the Structured Audio format is an example of ``Generalized Audio Coding,'' a new paradigm in which an MPEG-4 Structured Audio decoder is used to flexibly understand and play sound stored in any format. We cross-code SDIF into Structured Audio by writing a SAOL instrument for each type of SDIF sound representation and a translator that maps SDIF data into a Structured Audio score. Rather than use many notes to represent the frames of SDIF data, we use the ``streaming wavetable'' functions of SAOL to create instruments that dynamically interpret spectral, sinusoidal, or other constantly changing data. These SAOL instruments retrieve SDIF data from streaming wavetables via custom unit generators that can be reused to build SAOL synthesizers for other SDIF sound representations. We demonstrate the construction of several different SDIF object types within the Structured Audio framework; the resulting bitstreams are very compact and follow the MPEG-4 specification exactly. Any conforming MPEG-4 decoder can play them back and produce the sound desired by the composer. Our paper will discuss in depth the features of SAOL that make these sorts of instruments possible. By building a link between the MPEG-4 community and the SDIF community, our work contributes to both: The MPEG-4 community benefits by receiving support for synthesis from a large and extensible collection of sound descriptions, each with unique properties of data compression and mutability. The SDIF community gets a stable SDIF synthesis platform that is likely to be supported on a variety of inexpensive, high performance hardware platforms. MPEG-4 also provides the potential to integrate SDIF with other formats, e.g., streaming SDIF data synchronized with video and compressed speech. Finally, each standardization effort benefits from an expanded user base: SDIF users become MPEG-4 users without giving up their familiar tools, while MPEG-4 users outside the small community of sound analysis/synthesis researchers can discover SDIF and the high-level sound descriptions it supports. We have made the cross-coding tools and SDIF object instruments freely available to the computer music community in order to promote the continuing interoperability of these important specifications.


INPROC.cnmat:sdif-mpeg4-short [WS99a]
Author
M. Wright, E. Scheirer
TitleCross-Coding SDIF into MPEG-4 Structured Audio
BooktitleProc. ICMC
Year1999
AddressBeijing
urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/saol+sdif/icmc99-saol+sdif.html
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+mpeg4.html
bib-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999


INPROC.cnmat:sdif-msp [WDK+99b]
Author
Matthew Wright, Richard Dudas, Sami Khoury, Raymond Wang, David Zicarelli
TitleSupporting the Sound Description Interchange Format in the Max/MSP Environment
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year1999
AddressBeijing
MonthOctober
urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/msp+sdif/ICMC99-MSP+SDIF-short.html
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+msp.html
bib-urlhttp://www.ircam.fr/equipes/repmus/RMPapers/
AbstractThe Sound Description Interchange Format (``SDIF'') is an extensible, general-purpose framework for representing high-level sound descriptions such as sum-of-sinusoids, noise bands, time-domain samples, and formants, and is used in many interesting sound analysis and synthesis applications. SDIF data consists of time-tagged ``frames,'' each containing one or more 2D ``matrices''. For example, in an SDIF file representing additive synthesis data, the matrix rows represent individual sinusoids and the columns represent parameters such as frequency, amplitude, and phase. Because of Max/MSP's many attractive features for developing real-time computer music applications, it makes a fine environment for developing applications that manipulate SDIF data. These features include active support and development, a large library of primitive computational objects, and a rich history and repertoire. Unfortunately, Max/MSP's limited language of data structures does not support the structure required by SDIF. Although it is straightforward to extend Max/MSP with an object to read SDIF, there is no Max/MSP data type that could be used to output SDIF data to the rest of a Max/MSP application. We circumvent these problems with a novel technique to manipulate SDIF data within Max/MSP. We have created an object called ``SDIF-buffer'' that represents a collection of SDIF data in memory, analogous to MSP's ``buffer '' object that represents audio samples in memory. This allows SDIF data to be represented with C data structures. Max/MSP has objects that provide various control structures to read data from a ``buffer '' and output signals or events usable by other Max/MSP objects. Similarly, we have created a variety of ``SDIF selector'' objects that select a piece of SDIF data from an SDIF-buffer and shoehorn it into a standard Max/MSP data type. The simplest SDIF selector outputs the main matrix from the SDIF frame whose time tag is closest to a given input time. Arguments specify which columns should be output and whether each row should appear as an individual list or all the rows should be concatenated into a single list. More sophisticated SDIF selectors hide the discrete time sampling of SDIF frames, using interpolation along the time axis to synthesize SDIF data. This provides the abstraction of continuous time, with a virtual SDIF frame corresponding to any point along the time axis. We provide linear and a variety of polynomial interpolators. This abstraction of continuously-sampled SDIF data gives rise to sophisticated ways of moving through the time axis of an SDIF-buffer. We introduce the notion of a ``time machine'', a control structure for controlling position in an SDIF time axis in real time, and demonstrate time machines with musically useful features. ``SDIF mutator'' objects have been created that can manipulate data in an SDIF-buffer in response to Max messages. This allows us to write real-time sound analysis software to generate an SDIF model of an audio signal. We implement control structures such as transposition, filtering, and inharmonicity as normal Max/MSP patches that mutate a ``working'' SDIF-buffer; these are cascaded when they share the same SDIF-buffer. These control structures communicate via symbolic references to SDIF-buffers represented as normal Max messages. This system also supports network streaming of SDIF data. As research continues towards more efficient and musically interesting streaming protocols, Max/MSP interfaces will be implemented in C as SDIF mutators that access a given SDIF buffer via a struct definition in the exposed SDIF-buffer header file. One promising approach is to begin transmission with a low-resolution representation and then fill it in with increasing detail. Time machines communicate with streaming interfaces via Max messages to request or predict ranges of time that will need to be available in the near future.


INPROC.cnmat:sdif-msp-short [WDK+99a]
Author
M. Wright, R. Dudas, S. Khoury, R. Wang, D. Zicarelli
TitleSupporting the Sound Description Interchange Format in the Max/MSP Environment
BooktitleProc. ICMC
Year1999
AddressBeijing
urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/msp+sdif/ICMC99-MSP+SDIF-short.html
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+msp.html
bib-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC1999


INPROC.cnmat:sdif-srl [WCF+00b]
Author
Matthew Wright, Amar Chaudhary, Adrian Freed, Sami Khoury, Ali Momeni, Diemo Schwarz, David Wessel
TitleAn XML-based SDIF Stream Relationships Language
BooktitleProceedings of the International Computer Music Conference
Year2000
AddressBerlin
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC2000/abstracts/xml-sdif
bib-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC2000/


INPROC.cnmat:sdif-srl-short [WCF+00a]
Author
M. Wright, A. Chaudhary, A. Freed, S. Khoury, A. Momeni, D. Schwarz, D. Wessel
TitleAn XML-based SDIF Stream Relationships Language
BooktitleProc. ICMC
Year2000
AddressBerlin
abstract-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC2000/abstracts/xml-sdif
bib-urlhttp://cnmat.CNMAT.Berkeley.EDU/ICMC2000/


INPROC.cnmat:osw2000-short [CFW00]
Author
A. Chaudhary, A. Freed, M. Wright
TitleAn Open Architecture for Real-time Music Software
BooktitleProc. ICMC
Year2000
AddressBerlin


4   CSLU Speech Synthesis Research Group

MISCcslu:www [CSLU99]
KeyCSLU
TitleCSLU Speech Synthesis Research Group, Oregon Graduate Institute of Science and Technology
HowpublishedWWW page
Year1999
urlhttp://cslu.cse.ogi.edu/tts
pub-urlhttp://cslu.cse.ogi.edu/tts/publications
Notehttp://cslu.cse.ogi.edu/tts


ARTICLEcslu:ieeetsap98 [KMS98]
Author
F. Kossentini, M. Macon, M. Smith
TitleAudio coding using variable-depth multistage quantization
BooktitleIEEE Transactions on Speech and Audio Processing
Volume6
Year1998
Notewww [CSLU99]


INPROC.cslu:esca98mm [MCW98]
Author
M. W. Macon, A. E. Cronk, J. Wouters
TitleGeneralization and Discrimination in tree-structured unit selection
BooktitleProceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop
MonthNovember
Year1998
Notewww [CSLU99]
RemarksGreat overview of several unit selection methods, comprehensive biliography: origin of unit selection? [Sag88]. festival unit selection [HB96, BC95]. classification and regression trees [BFOS84a]. clustering and decision trees [BT97b, WCIS93, Nak94]. Mahalanobis distance [Don96]. decision trees for: speech recognition [NGY97], speech synthesis [HAea96]. data driven direct mapping with ANN [KCG96, TR]. distance measures for: coding [QBC88], ASR [NSRK85, HJ88], in general [GS97], concatenative speech synthesis [HC98, WM98]. PLP: [HM94]. Linear regression and correlation, Fisher transform: [Edw93]. Tree pruning: [CM98]. Masking effects: [Moo89].
AbstractConcatenative ``selection-based'' synthesis from large databases has emerged as a viable framework for TTS waveform generation. Unit selection algorithms attempt to predict the appropriateness of a particular database speech segment using only linguistic features output by text analysis and prosody prediction components of a synthesizer. All of these algorithms have in common a training or ``learning'' phase in which parameters are trained to select appropriate waveform segments for a given feature vector input. One approach to this step is to partition available data into clusters that can be indexed by linguistic features available at runtime. This method relies critically on two important principles: discrimination of fine phonetic details using a perceptually-motivated distance measure in training and generalization to unseen cases in selection. In this paper, we describe efforts to systematically investigate and improve these parts of the process.


INPROC.cslu:esca98kain [KM98a]
Author
A. Kain, M. W. Macon
TitlePersonalizing a speech synthesizer by voice adaptation
BooktitleProceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop
MonthNovember
Year1998
Pages225--230
Notewww [CSLU99]
AbstractA voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian Mixture Model. Numerical and perceptual evaluations reveal insights into the correlation between adaptation quality and the amount of training data, the number of free parameters. A new joint density estimation algorithm is compared to a previous approach. Numerical errors are studied on the basis of broad phonetic categories. A data augmentation method for training data with incomplete phonetic coverage is investigated and found to maintain high speech quality while partially adapting to the target voice.


INPROC.cslu:icslp98cronk [CM98]
Author
Andrew E. Cronk, Michael W. Macon
TitleOptimized Stopping Criteria for Tree-Based Unit Selection in Concatenative Synthesis
OldtitleOptimization of stopping criteria for tree-structured unit selection
BooktitleProc. of International Conference on Spoken Language Processing
Volume5
MonthNovember
Year1998
Pages1951--1955
Notewww [CSLU99]
RemarksSummary: Method for growing optimal clustering tree (CART, as in [BFOS84a]). Not stopping with thresholds, but growing three completely (until no splittable clusters are left), and then pruning by recombining clusters by a greedy algorithm. Gives evaluation measure V-fold cross validation for tree quality. Clusters represent units with equivalent target cost. A best split of a cluster maximizes the decrease in data impurity (lower within-cluster variance of acoustic features). N.B.: Clustering of units is not classification, as the classes are not known in advance, and the method is unsupervised! Weighting in distortion measure using Mahalanobis distance as the inverse of the variance. References: [BC95], [BT97b], [BFOS84a], [Don96], [Fuk90] (CART tree evaluation criterion), [NGY97], [Nak94], [WCIS93].


INPROC.cslu:icslp98kain [KM98b]
Author
A. Kain, M. W. Macon
TitleText-to-speech voice adaptation from sparse training data
BooktitleProc. of International Conference on Spoken Language Processing
MonthNovember
Year1998
Pages2847--2850
Notewww [CSLU99]


INPROC.cslu:icslp98-paper [WM98]
Author
J. Wouters, M. W. Macon
TitleA Perceptual Evaluation of Distance Measures for Concatenative Speech Synthesis
BooktitleProc. of International Conference on Spoken Language Processing
MonthNovember
Year1998
Notewww [CSLU99]
AbstractIn concatenative synthesis, new utterances are created by concatenating segments (units) of recorded speech. When the segments are extracted from a large speech corpus, a key issue is to select segments that will sound natural in a given phonetic context. Distance measures are often used for this task. However, little is known about the perceptual relevance of these measures. More insightinto the relationship between computed distances and perceptual differences is needed to develop accurate unit selection algorithms, and to improve the quality of the resulting computer speech. In this paper, we develop a perceptual test to measure subtle phonetic differences between speech units. We use the perceptual data to evaluate several popular distance measures. The results show that distance measures that use frequency warping perform better than those that do not, and minimal extra advantage is gained by using weighted distances or delta features.


INPROC.cslu:cslutoolkit [SCdV+98]
Author
S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, M. Cohen
TitleUniversal Speech Tools: the CSLU Toolkit
BooktitleProc. of International Conference on Spoken Language Processing
MonthNovember
Year1998
Notewww [CSLU99]


INCOLL.cslu:german98 [MKC+98]
Author
M. W. Macon, A. Kain, A. E. Cronk, H. Meyer, K. Mueller, B. Saeuberlich, A. W. Black
TitleRapid Prototyping of a German TTS System
BooktitleTech. Rep. CSE-98-015
PublisherDepartment of Computer Science, Oregon Graduate Institute of Science and Technology
AddressPortland, OR
MonthSeptember
Year1998
Notewww [CSLU99]


INPROC.cslu:icassp98mm [MMLV98]
Author
M. W. Macon, A. McCree, W. M. Lai, V. Viswanathan
TitleEfficient Analysis/Synthesis of Percussion Musical Instrument Sounds Using an All-Pole Model
BooktitleProceedings of the International Conference on Acoustics, Speech, and Signal Processing
Volume6
PublisherSpeech
MonthMay
Year1998
Pages3589--3592
Notewww [CSLU99]
AbstractIt is well-known that an impulse-excited, all-pole filter is capable of representing many physical phenomena, including the oscillatory modes of percussion musical instruments like woodblocks, xylophones, or chimes. In contrast to the more common application of all-pole models to speech, however, practical problems arise in music synthesis due to the location of poles very close to the unit circle. The objective of this work was to develop algorithms to find excitation and filter parameters for synthesis of percussion instrument sounds using only an inexpensive all-pole filter chip (TI TSP50C1x). The paper describes analysis methods for dealing with pole locations near the unit circle, as well as a general method for modeling the transient attackcharacteristics of a particular sound while independently controlling the amplitudes of each oscillatory mode.


INPROC.cslu:icassp98kain [KM98c]
Author
Alexander Kain, Michael W Macon
TitleSpectral Voice Conversion for Text-to-Speech Synthesis
Year1998
BooktitleProceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98)
Pages285--288
Notewww [CSLU99]
AbstractA new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speaker's average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data. However, speech quality improved with an increase in training set size.


INCOLL.cslu:ogireslpc97 [MCWK97]
Author
M. W. Macon, A. E. Cronk, J. Wouters, A. Kain
TitleOGIresLPC: Diphone synthesizer using residual-excited linear prediction
BooktitleTech. Rep. CSE-97-007
PublisherDepartment of Computer Science, Oregon Graduate Institute of Science and Technology
MonthSeptember
Year1997
AddressPortland, OR
Notewww [CSLU99]


INPROC.cslu:aes97 [MJLO+97a]
Author
M. W. Macon, L. Jensen-Link, J. Oliverio, M. Clements, E. B. George
TitleConcatenation-based MIDI-to-singing voice synthesis
Booktitle103rd Meeting of the Audio Engineering Society
PublisherNew York
Year1997
Notewww [CSLU99]
AbstractIn this paper, we propose a system for synthesizing the human singing voice and the musical subtleties that accompany it. The system, Lyricos, employs a concatenation-based text-to-speech method to synthesize arbitrary lyrics in a given language. Using information contained in a regular MIDI file, the system chooses units, represented as sinusoidal waveform model parameters, from an inventory of data collected from a professional singer, and concatenates these to form arbitrary lyrical phrases. Standard MIDI messages control parameters for the addition of vibrato, spectral tilt, and dynamic musical expression, resulting in a very natural-sounding singing voice.


INPROC.cslu:trsap97 [MC97]
Author
M. W. Macon, M. A. Clements
TitleSinusoidal modeling and modification of unvoiced speech
BooktitleIEEE Transactions on Speech and Audio Processing
Volume5
MonthNovember
Year1997
Pages557--560
Number6
Notewww [CSLU99]
AbstractAlthough sinusoidal models have been shown to be useful for time-scale and pitch modification of voiced speech, objectionable artifacts often arise when such models are applied to unvoiced speech. This correspondence presents a sinusoidal model-based speech modification algorithm that preserves the natural character of unvoiced speech sounds after pitch and time-scale modification, eliminating commonly-encountered artifacts. This advance is accomplished via a perceptually-motivated modulation of the sinusoidal component phases that mitigates artifacts in the reconstructed signal after time-scale and pitch modification


INPROC.cslu:icassp97 [MJLO+97b]
Author
Michael Macon, Leslie Jensen-Link, James Oliverio, Mark A. Clements, E. Bryan George
TitleA Singing Voice Synthesis System Based on Sinusoidal Modeling
Year1997
BooktitleProceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97)
Pages435--438
Notewww [CSLU99]
AbstractAlthough sinusoidal models have been demonstrated to be capable of high-quality musical instrument synthesis, speech modification, and speech synthesis, little exploration of the application of these models to the synthesis of singing voice has been undertaken. In this paper, we propose a system framework similar to that employed in concatenation-based text-to-speech synthesizers, and describe its extension to the synthesis of singing voice. The power and flexibility of the sinusoidal model used in the waveform synthesis portion of the system enables high-quality, computationally-effcient synthesis and the incorporation of musical qualities such as vibrato and spectral tilt variation. Modeling of segmental phonetic characteristics is achieved by employing a``unit selection'' procedure that selects sinusoidally-modeled segments from an inventory of singing voice data collected from ahuman vocalist. The system, called Lyricos, is capable of synthesizing very natural-sounding singing that maintains the characteristics and perceived identityof the analyzed vocalist.


INPROC.cslu:icassp96 [MC96]
AddressAtlanta, USA
Author
Michael W. Macon, Mark A. Clements
BooktitleProceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96)
TitleSpeech Concatenation and Synthesis Using an Overlap--Add Sinusoidal Model
Year1996
Volume1
Pages361--364
Notewww [CSLU99]
AbstractIn this paper, an algorithm for the concatenation of speech signal segments taken from disjoint utterances is presented. The algorithm is based on the Analysis-by-Synthesis/Overlap-Add (ABS/OLA) sinusoidal model, which is capable of performing high quality pitch- and time-scale modification of both speech and music signals. With the incorporation of concatenation and smoothing techniques, the model is capable of smoothing the transitions between separately-analyzed speech segments by matching the time- and frequency-domain characteristics of the signals at their boundaries. The application of these techniques in a text-to-speech system based on concatenation of diphone sinusoidal models is also presented.


INPROC.cslu:jasa95 [MC95]
Author
M. W. Macon, M. A. Clements
TitleSpeech synthesis based on an overlap-add sinusoidal model
BooktitleJ. of the Acoustical Society of America
Volume97
PublisherPt. 2
MonthMay
Year1995
Pages3246
Number5
Notewww [CSLU99]


5   CSTR Centre for Speech Technology Research

MISCcstr:www [CSTR99]
KeyCSTR
TitleCentre for Speech Technology Research, University of Edinburgh
HowpublishedWWW page
Year1999
urlhttp://www.cstr.ed.ac.uk/
pub-urlhttp://www.cstr.ed.ac.uk/projects/festival/papers.html
Notehttp://www.cstr.ed.ac.uk/


INPROC.cstr:unitsel96 [HB96]
Author
A. J. Hunt, A. W. Black
TitleUnit Selection in a Concatenative Speech Synthesis System using a Large Speech Database
BooktitleProc. ICASSP '96
AddressAtlanta, GA
MonthMay
Year1996
Pages373--376
Notewww [CSTR99] Electronic version: cstr/Black1996a.s.*
Remarkscited in [MCW98]
AbstractOne approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two meth ods for training from speech are presented which provide weights which produce more natural speech than can be obtained by handtuning.


INPROC.cstr:unitsel97 [BT97b]
Author
Alan W Black, Paul Taylor
TitleAutomatically Clustering Similar Units for Unit Selection in Speech Synthesis
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages601--604
Notewww [CSTR99] Electronic version: cstr/Black1997b.*
Remarkscited in [MCW98]: clustering and decision trees
AbstractThis paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis.


INPROC.cstr:eursp95 [BC95]
Author
A. W. Black, N. Campbell
TitleOptimising selection of units from speech databases for concatenative synthesis
BooktitleProc. Eurospeech '95
Volume1
AddressMadrid, Spain
MonthSeptember
Year1995
Pages581--584
RemarksSummary: Detailed description of unit selection model, used features and context, concatenation join point optimisation. Description of weight optimising procedure: euclidian cepstral distance (very limited first attempt) on real-speech test sentences. Unit selection as used in CHATR. cited in [MCW98]


INPROC.cstr:ssml97 [STTI97]
Author
Richard Sproat, Paul Taylor, Michael Tanenblatt, Amy Isard
TitleA Markup Language for Text-To-Speech Synthesis
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages1747--1750
Notewww [CSTR99] Electronic version: cstr/Sproat1997a.*
AbstractText-to-speech synthesizers must process text, and therefore require some knowledge of text structure. While many TTS systems allow for user control by means of ad hoc `escape sequences', there remains to date no adequate and generally agreed upon system-independent standard for marking up text for the purposes of synthesis. The present paper is a collaborative effort between two speech groups aimed at producing such a standard, in the form of an SGML-based markup language that we call STML --- Spoken Text Markup Language. The primary purpose of this paper is not to present STML as a fait accompli, but rather to interest other TTS research groups to collaborate and contribute to the development of this standard.


TECHREP.cstr:festival97 [BT97a]
Author
Alan Black, Paul Taylor
TitleThe Festival Speech Synthesis System: System Documentation (1.1.1)
InstitutionHuman Communication Research Centre
TypeTechnical Report
NumberHCRC/TR-83
MonthJanuary
Year1997
Pages154
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/manual-1.1.1/festival-1.1.1.ps.gz
Remarksnew version [BTC98]


TECHREP.cstr:festival98 [BTC98]
Author
Alan Black, Paul Taylor, Richard Caley
TitleThe Festival Speech Synthesis System: System Documentation (1.3.1)
InstitutionHuman Communication Research Centre
TypeTechnical Report
NumberHCRC/TR-83
MonthDecember
Year1998
Pages202
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/manual-1.3.1/festival_toc.html
Remarksupdated version of [BTC98], new utterance structure as in [Tay99], multiple synthesizers


TECHREP.cstr:festivalarch98 [Tay99]
Author
Paul Taylor
TitleThe Festival Speech Architecture
TypeWeb Page
Year1999
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/arch.html
AbstractThis is a short document describing the way we represent speech and linguistic structures in Festival. There are three main types of structure:
Items
An item is a single linguistic unit, such as a phone, word, syllable, syntactic node, intonation phrase etc. Each item has a set of features which describe its local properties. For instance a word could have features, , , ... Values of features can be real values or functions.
Relations
A relation links together items of a common linguistic type. For instance there we might have a word, phone, syntax or syllable relation. Relations are general graph structures, the most common type being a simple doubly linked list. Eg. the word relation is a doubly linked list that links all the words in an utterance in the order they occur in. Relations can also take the form of trees. For example, we have a syllable structure relation which gives onset, coda, nucleus and rhyme structure for a syllable. The crucial aspect of the Festival architecture is that items can be in more than one relation. For example, a syntax relation is a tree whose terminal elements are words, which are also in the word relation.
Utterances
Utterances contain a list of all the relations.


INPROC.Campbell_FactAffe_EURO97 [CYDH97]
Author
Nick Campbell, Itoh Yoshiharu, Wen Ding, Norio Higuchi
TitleFactors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages2635--2638
RemarksTO BE FOUND


ARTICLECampbell_CHATR [Cam96]
Author
N. Campbell
TitleCHATR: A High-Definition Speech Re-Sequencing System
JournalAcoustical Society of America and Acoustical Society of Japan, Third Joint Meeting
AddressHonolulu, HI
MonthDecember
Year1996
RemarksTO BE FOUND


6   Computer Science

BOOKsofteng [GJM91]
Author
Carlo Ghezzi, Mehdi Jazayeri, Dino Mandrioli
TitleFundamentals of Software Engineering
PublisherPrentice--Hall
AddressEnglewood Cliffs, NJ
Year1991


BOOKboehm [Boe89]
Author
Barry W. Boehm
TitleSoftware risk management
PublisherIEEE Computer Society Press
AddressWashington
Year1989


BOOKSzyperski98 [Szy98]
KeySzperski
Author
Clemens Szyperski
TitleComponent Software: Beyond Object-Oriented Programming
PublisherACM Press and Addison-Wesley
Year1998
AddressNew York, NY
AnnotateAn excellent overview of component-based programming. Many references.


BOOKbooch [Boo94]
Author
Grady Booch
TitleObject-Oriented Analysis and Design with Applications
Edition2nd
PublisherBenjamin--Cummings
AddressRedwood City, Calif.
Year1994


BOOKomt [RBP+91]
Author
James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, William Lorensen
TitleObject-Oriented Modeling and Design
PublisherPrentice--Hall
AddressEnglewood Cliffs, NJ
Year1991


BOOKivar [Jac95b]
Author
Ivar Jacobson
TitleObject-Oriented Software Engineering: a Use Case driven Approach
PublisherAddison--Wesley
AddressWokingham, England
Year1995


UNPUBLISHEDuml-www [Sof97]
KeyRational
Author
Rational Software
TitleUnified Modeling Language, version 1.1
MonthSeptember
Year1997
NoteOnline documentation1


BOOKDuCharme99 [DuC99]
Author
Bob DuCharme
TitleXML: the annotated specification
PublisherPrentice-Hall PTR
AddressUpper Saddle River, NJ 07458, USA
Pagesxix + 339
Year1999
Isbn0-13-082676-6
SeriesThe Charles F. Goldfarb series on open information management
KeywordsXML (Document markup language); Database management.


MISCXML [Cov00]
KeyXML
TitleThe XML Cover Pages
Author
Robin Cover
PublisherOASIS, Organization for the Advancement of Structured Information Standards
HowpublishedWWW page
Year2000
urlhttp://www.oasis-open.org/cover/xml.html
Notehttp://www.oasis-open.org/cover/xml.html
AbstractExtensible Markup Language (XML) is descriptively identified as "an extremely simple dialect [or 'subset'] of SGML" the goal of which "is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML," for which reason "XML has been designed for ease of implementation, and for interoperability with both SGML and HTML."
RemarksInteresting links (among a wealth of introductory as well as detailed information):

XML Metadata Interchange Format (XMI) - Object Management Group (OMG) http://www.oasis-open.org/cover/xmi.html.
The design of the XML Metadata Interchange Format (XMI) represents an extremely important initiative. It has a goal of unifying XML and related W3C specifications with several object/component modeling standards, as well as with STEP schemas, and more. Particularly, it would "combine the benefits of the web-based XML standard for defining, validating, and sharing document formats on the web with the benefits of the object-oriented Unified Modeling Language (UML), a specification of the Object Management Group (OMG) that provides application developers a common language for specifying, visualizing, constructing, and documenting distributed objects and business models."

Extensible User Interface Language (XUL) http://www.oasis-open.org/cover/xul.html
"XUL stands for 'extensible user interface language'. It is an XML-based language for describing the contents of windows and dialogs. XUL has language constructs for all of the typical dialog controls, as well as for widgets like toolbars, trees, progress bars, and menus."

User Interface Markup Language (UIML) http://www.oasis-open.org/cover/uiml.html
The User Interface Markup Language (UIML) "allows designers to describe the user interface in generic terms, and then use a style description to map the interface to various operating systems (OSs) and appliances. Thus, the universality of UIML makes it possible to describe a rich set of interfaces and reduces the work in porting the user interface to another platform (e.g., from a graphical windowing system to a hand-held appliance) to changing the style description." See the separate document.

XML Application Environments, Development Toolkits, Conversion http://www.oasis-open.org/cover/publicSW.htm\#xmlTestbed
XML Testbed. An XML application environment written in Java. From Steve Withall. ..."uses an XML configuration file to define the (Swing-based) user interface; includes its own non-validating XML parser (though it can use any SAX parser instead), a nascent XSL engine (to the old submission standard - just in time to be out of date), and a few other odds and ends."
http://www.w3.org/XML/1998/08withall/
http://www.w3.org/XML/1998/08withall/xt-beta-1-980816.zip
http://www.w3.org/XML/1998/08withall/MontrealSlides/XXXIntroduction.html


ARTICLEAbrams:1999:UAI [APB+99]
Author
Marc Abrams, Constantinos Phanouriou, Alan L. Batongbacal, Stephen M. Williams, Jonathan E. Shuster
TitleUIML: an appliance-independent XML user interface language
JournalComputer Networks (Amsterdam, Netherlands: 1999)
Volume31
Number11--16
Pages1695--1708
Day17
MonthMay
Year1999
Coden????
Issn1389-1286
BibdateFri Sep 24 19:43:29 MDT 1999
urlhttp://www.elsevier.com/cas/tree/store/comnet/sub/1999/31/11-16/2170.pdf
RemarksTO BE FOUND


BOOKChauvet:1999:CTC [Cha99]
Author
Jean-Marie Chauvet
TitleComposants et transactions: COMMTS, CorbaOTS, JavaEJB, XML
PublisherEyrolles: Informatiques magazine
AddressParis, France
Pagesv + 274
Year1999
Isbn2-212-09075-7
Lccn????
BibdateTue Sep 21 10:27:35 MDT 1999
SeriesCollection dirigée par Guy Hervier
AlttitleComposants et transactions: Corba/OTS, EJB/JTS, COM/MTS: comprendre l'architecture des serveurs d'applications
AnnoteTitre de couv.: ``Composants et transactions: Corba/OTS, comprendre l'architecture des serveurs d'applications''. Bibliogr.: p. 267-269.
KeywordsConception orienté objets (informatique).; Objet composant, Modeles d'.; Javabeans.
RemarksTO BE FOUND


7   IRCAM

MISCanasyn:www [AS00]
KeyAS
TitleAnalysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou
HowpublishedWWW page
Year2000
urlhttp://www.ircam.fr/anasyn/
pub-urlhttp://www.ircam.fr/anasyn/listePublications/index.html
Notehttp://www.ircam.fr/anasyn/


MISCanasyn:oldwww [AS99]
KeyAS
TitleAnalysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou
HowpublishedWWW page
Year1999
urlhttp://www.ircam.fr/equipes/analyse-synthese/
pub-urlhttp://www.ircam.fr/equipes/analyse-synthese/listePublications/index.html
Notehttp://www.ircam.fr/equipes/analyse-synthese/


INPROC.PEET981 [Pee98]
Author
G. Peeters
TitleAnalyse-Synthèse des sons musicaux par la méthode PSOLA
Year1998
AddressAgelonde (France)
MonthMay


INPROC.PEET983 [PR98]
Author
G. Peeters, X. Rodet
TitleSinusoidal versus Non-Sinusoidal Signal Characterisation
Year1998
AddressBarcelona
MonthNovember
Annote(Workshop on Digital Audio Effects)


INPROC.PEET991 [PR99b]
Author
G. Peeters, X. Rodet
TitleSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year1999
AddressBeijing
MonthOctober


INPROC.PEET992 [PR99a]
Author
G. Peeters, X. Rodet
TitleNon-Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
Year1999
AddressOrlando
MonthNovember


INPROC.OM97 [AAFH97]
Author
Gérard Assayag, Carlos Agon, Joshua Fineberg, Peter Hanappe
TitleAn Object Oriented Visual Environment For Musical Composition
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year1997
AddressThessaloniki, Greece
urlhttp://www.ircam.fr/equipes/repmus/RMPapers/Assayag97/index.html
bib-urlhttp://www.ircam.fr/equipes/repmus/RMPapers/


INPROC.OM98 [AADR98]
Author
Carlos Agon, Gérard Assayag, Olivier Delerue, Camilo Rueda
TitleObjects, Time and Constraints in OpenMusic
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year1998
AddressAnn Arbor, Michigan
MonthOctober
urlhttp://www.ircam.fr/equipes/repmus/RMPapers/ICMC98a/OMICMC98.html
bib-urlhttp://www.ircam.fr/equipes/repmus/RMPapers/


ARTICLEOM99 [ARL+99b]
Author
Gérard Assayag, Camilo Rueda, Mikael Laurson, Carlos Agon, O. Delerue
TitleComputer Assisted Composition at Ircam: PatchWork & OpenMusic
JournalComputer Music Journal
Year1999
Volume23
Number3
urlhttp://www.ircam.fr/equipes/repmus/RMPapers/CMJ98/index.html
bib-urlhttp://www.ircam.fr/equipes/repmus/RMPapers


ARTICLEOM99-short [ARL+99a]
Author
G. Assayag, C. Rueda, M. Laurson, C. Agon, O. Delerue
TitleComputer Assisted Composition at Ircam: PatchWork & OpenMusic
JournalComputer Music Journal
MonthFall
Year1999
Volume23
Number3
urlhttp://www.ircam.fr/equipes/repmus/RMPapers/CMJ98/index.html
bib-urlhttp://www.ircam.fr/equipes/repmus/RMPapers


INPROC.OM2000 [AAS00c]
Author
Gérard Assayag, Carlos Agon, Marco Stroppa
TitleHigh Level Musical Control of Sound Synthesis in OpenMusic
BooktitleProceedings of the International Computer Music Conference (ICMC)
Year2000
AddressBerlin
MonthAugust


INPROC.OM2000-short [AAS00a]
Author
G. Assayag, C. Agon, M. Stroppa
TitleHigh Level Musical Control of Sound Synthesis in OpenMusic
BooktitleProc. ICMC
AddressBerlin
Year2000


INPROC.OM2000-sshort [AAS00b]
Author
G. Assayag, C. Agon, M. Stroppa
TitleHigh Level Musical Control of Sound Synthesis in OpenMusic
BooktitleProc. ICMC
Year2000


INPROC.sdif-ext2000 [SW00]
Author
Diemo Schwarz, Matthew Wright
TitleExtensions and Applications of the SDIF Sound Description Interchange Format
BooktitleProceedings of the International Computer Music Conference
MonthAugust
Year2000
AddressBerlin


8   Psychoacoustics

BOOKmoore89 [Moo89]
Author
B. C. J. Moore
TitleAn Introduction to the Psychology of Hearing
PublisherAcademic Press Limited
Edition3rd
Year1989
Remarkscited in [MCW98]: masking effects


INPROC.psy:susini97 [SMW97]
Author
Patrick Susini, Stephen McAdams, Suzanne Winsberg
TitleCaractérisation perceptive des bruits de véhicules
BooktitleActes du 4ème Congrès Français d'Acoustique
PublisherSociété Française d'Acoustique
MonthApril
Year1997
AddressMarseille


INPROC.psy:faure97 [FM97]
Author
Anne Faure, Stephen McAdams
TitleComparaison de profils sémantiques et de l'espace perceptif de timbres musicaux
BooktitleActes du 4ème Congrès Français d'Acoustique
PublisherSociété Française d'Acoustique
MonthApril
Year1997
AddressMarseille
urlhttp://mediatheque.ircam.fr/articles/textes/Faure97a/
RemarksMapping of semantic profiles (letting subjects choose descriptive words for timbre) to perceptual dimensions. Some references: Faure96, Grey77, Krimphoff94, Krumhansl89, McAdams95, Tversky77
AbstractThe purpuse of this study is to compare semantical profiles and perceptual dimensions of musical timbre. In a previous experiment, we extracted 23 most often used verbal attributes from spontaneous verbalizations describing similarities and differences between pairs of timbres and we tried to compare their use with the relative positions of timbres along each perceptual dimension. In this experiment, we used a VAME paradigm to test more quantitatively these verbal attributes. 12 synthetic sounds were presented and rated on each of the 23 unipolar semantic scales. Several distances (ether euclidien or from Tversky's model of similarity) between timbres were then calculated and the MDS semantical models obtained were compared to perceptual one. The structure of semantical and perceptual models differed a lot and the correlations with the semantical scales leads us to prefer a model in two dimensions without specificities derived from a distance directly obtained from Tversky's mod


9   Sound Synthesis

INPROC.beauchamp95 [BHM95]
Author
James Beauchamp, A. Horner, S. McAdams
TitleMusical Sounds, Data Reduction, and Perceptual Control Parameters
BooktitleProgram for SMPC95, Society for Music Perception and Cognition
PublisherCenter for New Music and Audio Technologies (CNMAT)
AddressUniv. Calif. Berkeley
Pages8--9
Year1995
bib-urlhttp://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html
RemarksTO BE FOUND!


ARTICLEbeauchamp98 [Bea98]
Author
James Beauchamp
TitleMethods for measurement and manipulation of timbral physical correlates
BooktitleJ. Acoust. Soc. Am.
Year1998
Volume103
PartPt. 2
Pages2966
Number5
bib-urlhttp://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html
RemarksTO BE FOUND!


ARTICLEhorner98 [YH]
Author
Jennifer Yuen, Andrew Horner
TitleHybrid Sampling-Wavetable Synthesis with Genetic Algorithms
BooktitleJournal of the Audio Engineering Society
Volume45
Pages316--330
Number5
bib-urlhttp://www.cs.ust.hk/faculty/horner/subpage/pubs.html
journal-urlhttp://www.aes.org/journal/toc/may97.html
RemarksTo BE FOUND! high quality sort-of-concatenative instrument synthesis?
AbstractA combination of hybrid sampling and wavetable synthesis for matching acoustic instruments is demonstrated using genetic algorithm optimization. Tone sampling is used for the critical attack portion and wavetable synthesis is used to match the more gradually changing sustain and decay. A hybrid sampling wavetable performs a smooth crossfade transition. This method has been used to synthesize piano, harp, glockenspiel, and temple block tones.


ARTICLEhorner96 [CH]
Author
Ngai-Man Cheung, Andrew Horner
TitleGroup Synthesis with Genetic Algorithms
BooktitleJournal of the Audio Engineering Society
Volume44
Number3
Pages130--147
bib-urlhttp://www.cs.ust.hk/faculty/horner/subpage/pubs.html
journal-urlhttp://www.aes.org/journal/toc/march.html
AbstractMusical sounds can be efficiently synthesized using an automatic genetic algorithm to decompose musical instrument tones into group synthesis parameters. By separating the data into individual matrices, a high degree of data compression with low computational cost is achieved.


INPROC.chandra98 [Cha98]
Author
Arun Chandra
TitleCompositional experiments with concatenating distinct waveform periods while changing their structural properties
BooktitleSEAMUS'98
PublisherSchool of Music, University of Illinois
AddressUrbana, IL
MonthApril
Year1998
urlhttp://cmp-rs.music.uiuc.edu/people/arunc/miranda/seamus98/index.htm
ps-urlhttp://cmp-rs.music.uiuc.edu/people/arunc/miranda/seamus98/pre.ps
NoteAvailable online2
Abstractwigout is a sound-synthesis program, written in C and running under Unix and 32-bit Intel systems. The premise of the program is to allow the composer to compose the waveform with which she composes. Thus, sound is not a building-block with which one composes, but the subject matter of composition. The composer defines a waveform state, consisting of an arbitrary number of segments. Each segment is similar to (but not identical with) 1) a sine wave; 2) a square wave; 3) a triangle wave; or 4) a sawtooth wave. The composer stipulates the duration for which the sound is to last, and then the waveform state (which is on the order of a few milliseconds long) is iterated until the desired duration is reached. Upon each iteration, each segment changes itself by a specified amount. The resulting sound is the result of many independent changes in the waveform's segments. Up till now, five compositions have been written using wigout, for tape alone, and for tape and performers.


ARTICLEbeauchamp96 [BH]
Author
James Beauchamp, A. Horner
TitlePiecewise Linear Approximation of Additive Synthesis Envelopes: A Comparison of Various Methods
BooktitleComputer Music Journal
Volume20
Pages72--95
Number2
bib-urlhttp://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html


ARTICLEwakefield96 [PW96]
Author
W. J. Pielemeier, G. H. Wakefield
TitleA High Resolution Time--Frequency Representation for Musical Instrument Signals
JournalJ. Acoust. Soc. Am.
Volume99
Number4
Pages2382--2396
Year1996
bib-url


INPROC.wakefield98 [Wak98a]
Author
G. H. Wakefield
TitleTime--Pitch Representations: Acoustic Signal Processing and Auditory Representations
BooktitleProceedings of the IEEE Intl. Symp. on Time--Frequency/Time--Scale
Year1998
AddressPittsburgh


INPROC.wakefield98-short [Wak98b]
Author
G. H. Wakefield
TitleTime--Pitch Representations: Acoustic Signal Processing and Auditory Representations
BooktitleProc. IEEE Intl. Symp. Time--Frequency/Time--Scale
Year1998
AddressPittsburgh


INPROC.loris2000a [FHC00d]
Author
Kelly Fitz, Lippold Haken, Paul Chirstensen
TitleTransient Preservation under Transformation in an Additive Sound Model
BooktitleProceedings of the International Computer Music Conference
AddressBerlin
Year2000


INPROC.loris2000a-short [FHC00b]
Author
K. Fitz, L. Haken, P. Chirstensen
TitleTransient Preservation under Transformation in an Additive Sound Model
BooktitleProc. ICMC
AddressBerlin
Year2000


INPROC.loris2000b [FHC00c]
Author
Kelly Fitz, Lippold Haken, Paul Chirstensen
TitleA New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling
BooktitleProc. ICMC
AddressBerlin
Year2000


INPROC.loris2000b-short [FHC00a]
Author
K. Fitz, L. Haken, P. Chirstensen
TitleA New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling
BooktitleProc. ICMC
AddressBerlin
Year2000


INPROC.sms97 [SBHL97b]
Author
X. Serra, J. Bonada, P. Herrera, R. Loureiro
TitleIntegrating Complementary Spectral Models in the Design of a Musical Synthesizer
BooktitleProceedings of the International Computer Music Conference
Year1997
AddressTessaloniki


INPROC.sms97-short [SBHL97c]
Author
X. Serra, J. Bonada, P. Herrera, R. Loureiro
TitleIntegrating Complementary Spectral Models in the Design of a Musical Synthesizer
BooktitleProc. ICMC
Year1997
AddressTessaloniki


ARTICLEsms90 [SS90]
Author
X. Serra, J. Smith
TitleSpectral Modeling Synthesis: a Sound Analysis/Synthesis System Based on a Deterministic plus Stochastic Decomposition
JournalComputer Music Journal
Year1990
Volume14
Number4
Pages12--24


ARTICLEbeauchamp93 [Bea93a]
Author
J. W. Beauchamp
TitleUnix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds
JournalProceedings of the Audio Engineering Society
Year1993


INPROC.beauchamp93-short [Bea93b]
Author
J. W. Beauchamp
TitleUnix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds
BooktitleProc. AES
Year1993


10   Speech Synthesis

BOOKspeechsyn96 [vSHOS96]
Editor
J.P.H. van Santen, J. Hirschberg, J. Olive, R. Sproat
TitleProgress in Speech Synthesis
PublisherSpringer-Verlag
AddressNew York
Year1996
Isbn0-387-94701-9
amazon-urlhttp://www.amazon.de/exec/obidos/ASIN/0387947019
Remarksvan Santen Author Links: http://www.bell-labs.com/project/tts/BOOK.html, Springer Heidelberg: http://www.springer.de/cgi-bin/search-book.pl?isbn=0-387-94701-9, Springer New-York: http://www.springer-ny.com/catalog/np/may96np/DATA/0-387-94701-9.html


ARTICLEpsola92 [VMT92]
Keysynthesis
Author
H. Valbret, E. Moulines, J. P. Tubach
TitleVoice transformation using PSOLA technique
Journalspeech
Year1992
MonthJune
Volume11
Number2-3
Pages189--194


BOOKchomsky68sound [CH68]
Author
N. Chomsky, M. Halle
TitleThe Sound Pattern of English
PublisherHarper & Row
AddressNew York, NY
Year1968


ARTICLEbailly1991 [BLS91]
Author
G. Bailly, R. Laboissière, J. L. Schwartz
TitleFormant trajectories as audible gestures: an alternative for speech synthesis.
JournalJournal of Phonetics
Year1991
Volume19
Pages9--23


INPROC.soong88 [SR88]
Author
F.K. Soong, A.E. Rosenberg
TitleOn the use of Instantaneous and Transitional Spectral Information in Speaker Recognition
BooktitleIEEE Transactions on Acoustics, Speech and Signal Processing
Volume36
Year1988
Pages871--879
Keywordsderivative of cepstrum
Remarkscited in [MD97a]


INPROC.griffin88 [GL88]
Author
D.W. Griffin, J.S. Lim
TitleMultiband Excitation Vocoder
BooktitleIEEE Transactions on Acoustics, Speech and Signal Processing
Volume36
Year1988
Pages1123--1235
Keywordsrobust cepstrum by sinusoidal weighting
Remarkscited in [MD97a]


INPROC.allessandro95 [dM95]
Author
C. d'Alessandro, P. Mertens
TitleAutomatic pitch contour stylization using a model of tonal perception
BooktitleComputer Speech and Language
Year1995
Pages257--288
Keywordsperceptual stylization, based on a model of tonal perception
Remarkscited in [MD97a]


INPROC.traber92 [Tra92]
Author
C. Traber
TitleF0 Generation with a Database of Natural F0 Patterns and with a Neural Network
BooktitleTalking Machines: Theories, Models, and Designs
Editor
G. Bailly, C. Benot
PublisherNorth Holland
Year1992
Pages287--304
Remarkscited in [MD97a]: machine learning techniques: multilayer perceptrons


INPROC.sagisaka92 [SK92]
Author
Y. Sagisaka, N. Kaiki
TitleOptimization of Intonation Control Using Statistical F0 Resetting Characteristics
BooktitleProceedings of the International Conference on Acoustics
Volume2
PublisherSpeech and Signal Processing
Year1992
Pages49--52
Remarkscited in [MD97a]: machine learning techniques: linear regression


INPROC.hirschberg91 [Hir91]
Author
J. Hirschberg
TitleUsing Text Analysis to Predict Intonational Boundaries
BooktitleProceedings of Eurospeech
LocationGenova
Year1991
Pages1275--1278


INPROC.moebius93 [MPH93]
Author
B. Möbius, M. Pätzold, W. Hess
TitleAnalysis and Synthesis of German F0 Contours by Means of Fujisaki's Model
BooktitleSpeech Communication
Volume13
Year1993
Pages53--61


INPROC.sagisaka88 [Sag88]
Author
Y. Sagisaka
TitleSpeech synthesis by rule using an optimal selection of non-uniform synthesis units
BooktitleProc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing
Year1988
Pages679
Remarks(origin of unit selection?), cited in [MCW98]: since the late 1980's, selection-based concatenative synthesis from large databases has received increased interest as a potential improvement upon fixed diphone inventories. TO BE FOUND


INPROC.wang93 [WCIS93]
Author
W. J. Wang, W. N. Campbell, N. Iwahashi, Y. Sagisaka
TitleTree-based unit selection for English speech synthesis
BooktitleProc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing
Year1993
Pages191--194
Remarkscited in [MCW98, CM98]: clustering and decision trees. TO BE FOUND


INPROC.nakajima94 [Nak94]
Author
S. Nakajima
TitleAutomatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering
BooktitleSpeech Communication
Volume14
MonthSeptember
Year1994
Pages313
Remarkscited in [MCW98, CM98]: clustering and decision trees. TO BE FOUND


PHDTHESISdonovan96 [Don96]
Author
R. E. Donovan
TitleTrainable Speech Synthesis
TypePhD thesis
SchoolCambridge University
Year1996
Remarkscited in [MCW98]: Mahalanobis distance


INPROC.huang96 [HAea96]
Author
X. D. Huang, A. Acero, et al.
TitleWhistler: A trainable text-to-speech system
BooktitleProc. of the Int'l Conf. on Spoken Language Processing
Year1996
Pages2387--2390
Remarkscited in [MCW98]: decision trees for speech synthesis


INPROC.karaali96 [KCG96]
Author
O. Karaali, G. Corrigan, I. Gerson
TitleSpeech Synthesis with Neural Networks
BooktitleProc. of World Congress on Neural Networks
MonthSeptember
Year1996
Pages45--50
Remarkscited in [MCW98]: data driven direct mapping with NN


INPROC.tuerk93 [TR]
Author
C. Tuerk, T. Robinson
TitleSpeech synthesis using artificial neural networks trained on cepstral coefficients
BooktitleProc. EUROSPEECH
Pages1713--1716
Remarkscited in [MCW98]: data driven direct mapping with NN


BOOKquackenbush88 [QBC88]
Author
S. R. Quackenbush, T. P. Barnwell, M. A. Clements
TitleObjective Measures of Speech Quality
PublisherPrentice-Hall
AddressEnglewood Cliffs, NJ
Year1988
Remarkscited in [MCW98]: distance measures for coding


INPROC.nocerino85 [NSRK85]
Author
N. Nocerino, F. K. Soong, L. R. Rabiner, D. H Klatt
TitleComparative study of several distortion measures for speech recognition
BooktitleSpeech Communication
Volume4
Year1985
Pages317--331
Remarkscited in [MCW98]: distance measures for ASR


INPROC.asp:icassp88 [HJ88]
Author
H. Hermansky, J. C. Junqua
TitleOptimization of perceptually-based ASR front-end
BooktitleProceedings of the International Conference on Acoustics, Speech, and Signal Processing
Year1988
Pages219
Remarkscited in [MCW98]: distance measures for ASR


INPROC.ghitza97 [GS97]
Author
O. Ghitza, M. M. Sondhi
TitleOn the perceptual distance between two speech segments
BooktitleJournal of the Acoustical Society of America
Year1997
Volume101
Pages522--529
Number1
Remarkscited in [MCW98]: distance measures in general


INPROC.hansen98 [HC98]
Author
J. H. L. Hansen, D. T. Chappell
TitleAn auditory-based distortion measure with application to concatenative speech synthesis
BooktitleIEEE Trans. on Speech and Audio Processing
Volume6
MonthSeptember
Year1998
Pages489--495
Remarkscited in [MCW98]: distance measures for concatenative speech synthesis


INPROC.asp:itsa94 [HM94]
Author
H. Hermansky, N. Morgan
TitleRASTA processing of speech
BooktitleIEEE Transactions on Speech and Acoustics
Volume2
MonthOctober
Year1994
Pages587--589
Remarkscited in [MCW98]


BOOKedwards93 [Edw93]
Author
A. L. Edwards
TitleAn Introduction to Linear Regression and Correlation
PublisherW. H. Freeman and Co
AddressSan Francisco
Year1993
Remarkscited in [MCW98]: Fisher transform


INPROC.Ding_OptiUnit_EURO97 [DC97]
Author
Wen Ding, Nick Campbell
TitleOptimising Unit Selection with Voice Source and Formants in the CHATR Speech Synthesis System
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages537--540
RemarksTo BE FOUND!


11   Spectral Envelopes

MASTER.diemo98 [Sch98c]
Author
Diemo Schwarz
TitleSpectral Envelopes in Sound Analysis and Synthesis
TypeDiplomarbeit Nr. 1622
SchoolUniversität Stuttgart, Fakultät Informatik
AddressStuttgart, Germany
MonthJune
Year1998
urlhttp://www.ircam.fr/anasyn/schwarz/da/
official-urlhttp://www.informatik.uni-stuttgart.de/cgi-bin/ncstrl_rep_view.pl?/inf/ftp/pub/library/medoc.ustuttgart_fi/DIP-1622/DIP-1622.bib
AbstractIn this project, Spectral Envelopes in Sound Analysis and Synthesis, various methods for estimation, representation, file storage, manipulation, and application of spectral envelopes to sound synthesis were evaluated, improved, and implemented. A prototyping and testing environment was developed, and a function library to handle spectral envelopes was designed and implemented. For the estimation of spectral envelopes, after defining the requirements, the methods LPC, cepstrum, and discrete cepstrum were examined, and also improvements of the discrete cepstrum method (regularization, stochastic (or probabilistic) smoothing, logarithmic frequency scaling, and adding control points). An evaluation with a large corpus of sound data showed the feasibility of discrete cepstrum spectral envelope estimation. After defining the requirements for the representation of spectral envelopes, filter coefficients, spectral representation, break-point functions, splines, formant representation, and high resolution matching pursuit were examined. A combined spectral representation with indication of the regions of formants (called fuzzy formants) was defined to allow for integration of spectral envelopes with precise formant descriptions. For file storage, new data types were defined for the Sound Description Interchange Format (SDIF) standard. Methods for manipulation were examined, especially interpolation between spectral envelopes, and between spectral envelopes and formants, and other manipulations, based on primitive operations on spectral envelopes. For sound synthesis, application of spectral envelopes to additive synthesis, and time-domain or frequency-domain filtering have been examined. For prototyping and testing of the algorithms, a spectral envelope viewing program was developed. Finally, the spectral envelope library, offering complete functionality of spectral envelope handling, was developed according to the principles of software engineering.


MASTER.diemo98-short [Sch98a]
Author
D. Schwarz
TitleSpectral Envelopes in Sound Analysis and Synthesis
TypeDiplomarbeit Nr. 1622
SchoolUniversität Stuttgart, Fakultät Informatik
AddressStuttgart, Germany
Year1998


MASTER.diemo98-sshort [Sch98b]
Author
D. Schwarz
TitleSpectral Envelopes in Sound Analysis and Synthesis
TypeDiplomarbeit
SchoolUniversität Stuttgart, Informatik
Year1998


BOOKbookbeauchamp [Bea00]
Editor
James Beauchamp
TitleThe Sound of Music
PublisherSpringer
AddressNew York
Year2000


INBOOKbookbeauchamp-specenv [RSb]
Author
Xavier Rodet, Diemo Schwarz
TitleSpectral Envelopes and Additive+Residual Analysis-Synthesis
NoteIn J. Beauchamp, ed. The Sound of Music. Springer, New York, to be published 2000


INBOOKbookbeauchamp-specenv-short [RSa]
Author
X. Rodet, D. Schwarz
TitleSpectral Envelopes and Additive+Residual Analysis-Synthesis
NoteIn J. Beauchamp, ed. The Sound of Music. Springer, N.Y., to be published


INPROC.holmes83 [Hol83a]
Author
J. N. Holmes
TitleFormant synthesizers: Cascade or Parallel
BooktitleSpeech Communication
Year1983
Volume2
Pages251--273


INPROC.holmes83-short [Hol83b]
Author
J. N. Holmes
TitleFormant synthesizers: Cascade or Parallel
BooktitleSpeech Communication
Volume2
Year1983


BOOKhamming77 [Ham77b]
Author
Richard Wesley Hamming
TitleDigital Filters
PublisherPrentice--Hall
SeriesSignal Processing Series
AddressEnglewood Cliffs
Year1977


BOOKhamming77-short [Ham77a]
Author
R. W. Hamming
TitleDigital Filters
PublisherPrentice--Hall
SeriesSignal Processing Series
Year1977


INPROC.fft-2 [FRD93a]
Author
A. Freed, X. Rodet, Ph. Depalle
TitlePerformance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1
BooktitleProceedings of the 19th International Computer Music Conference
AddressWaseda University Center for Scholarly Information
Year1993
PublisherInternational Computer Music Association
urlhttp://cnmat.CNMAT.Berkeley.EDU/~adrian/FFT-1/FFT-1_ICMC93.html


INPROC.fft-2-short [FRD93b]
Author
A. Freed, X. Rodet, Ph. Depalle
TitlePerformance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1
BooktitleProc. ICMC
Year1993


INPROC.fft-3 [SBHL97d]
Author
Xavier Serra, Jordi Bonada, Perfecto Herrera, Ramon Loureiro
TitleIntegrating complementary spectral models in the design of a musical synthesizer
BooktitleProceedings of the International Computer Music Conference
Year1997
urlhttp://www.iua.upf.es/~xserra/articles/spectral-models/


INPROC.fft-3-short [SBHL97a]
Author
X. Serra, J. Bonada, P. Herrera, R. Loureiro
TitleIntegrating Complementary Spectral Models in the Design of a Musical Synthesizer
BooktitleProc. ICMC
Year1997


PHDTHESISmarine-thesis [Oud98b]
Author
Marine Campedel Oudot
TitleÉtude du modèle ``sinusoïdes et bruit'' pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale
TypeThèse
SchoolEcole Nationale Supérieure des Télécommunications
AddressParis, France
MonthNovember
Year1998


PHDTHESISmarine-thesis-short [Oud98a]
Author
M. Campedel Oudot
TitleÉtude du modèle sinusoïdes et bruit pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale
TypeThèse
SchoolENST
AddressParis
Year1998


INPROC.jmax99 [DCMS99]
Author
François Déchelle, Maurizio De Cecco, Enzo Maggi, Norbert Schnell
TitlejMax Recent Developments
BooktitleProceedings of the International Computer Music Conference
Year1999


INPROC.jmax99-short [DDMS99]
Author
F. Déchelle, M. DeCecco, E. Maggi, N. Schnell
TitlejMax Recent Developments
BooktitleProc. ICMC
Year1999


INPROC.jmax2000 [DSBO00b]
Author
François Déchelle, Norbert Schnell, Ricardo Borghesi, Nicolas Orio
TitleThe jMax Environment: An Overview of New Features
BooktitleProceedings of the International Computer Music Conference
AddressBerlin
Year2000


INPROC.jmax2000-short [DSBO00a]
Author
F. Déchelle, N. Schnell, R. Borghesi, N. Orio
TitleThe jMax Environment: An Overview of New Features
BooktitleProc. ICMC
AddressBerlin
Year2000


INPROC.lemur95 [FHH95a]
Author
K. Fitz, L. Haken, B. Holloway
TitleLemur -- A Tool for Timbre Manipulation
BooktitleProceedings of the International Computer Music Conference
Pages158--161
AddressBanff
MonthSeptember
Year1995


INPROC.lemur95-short [FHH95b]
Author
K. Fitz, L. Haken, B. Holloway
TitleLemur -- A Tool for Timbre Manipulation
BooktitleProc. ICMC
Year1995


INPROC.HRMP [GBM+96]
Author
R. Gribonval, E. Bacry, S. Mallat, Ph. Depalle, X. Rodet
TitleAnalysis of Sound Signals with High Resolution Matching Pursuit
BooktitleProceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS)
Year1996
Notewww [AS00]
url\url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/TFTS96/tfts96.ps.gz}


INPROC.HRMP2 [GDR+96]
Author
R. Gribonval, Ph. Depalle, X. Rodet, E. Bacry, S. Mallat
TitleSound Signal Decomposition using a High Resolution Matching Pursuit
BooktitleProceedings of the International Computer Music Conference (ICMC)
LocationClear Water Bay, Hong-Kong
MonthAugust
Year1996
Notewww [AS00]
abstract-url\url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC96HRMP/abstract.txt}
url\url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC96HRMP/ICMC96HRMP.ps.gz}


ARTICLEfof [Rod84b]
Author
Xavier Rodet
TitleTime-Domain Formant-Wave-Function Synthesis
JournalComputer Music Journal
Volume8
Number3
MonthFall
Year1984
Pages9--14
Notereprinted from [Sim80]


ARTICLEfof-short [Rod84a]
Author
X. Rodet
TitleTime-Domain Formant-Wave-Function Synthesis
JournalComputer Music Journal
MonthFall
Year1984


BOOKfof2 [Sim80]
Editor
J. C. Simon
TitleSpoken Language Generation and Understanding
PublisherD. Reidel Publishing Company
AddressDordrecht, Holland
Year1980


ARTICLEchant [RPB84b]
Author
Xavier Rodet, Yves Potard, Jean--Baptiste Barrière
TitleThe Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General
JournalComputer Music Journal
Volume8
Number3
MonthFall
Year1984
Pages15--31


ARTICLEchant-short [RPB84a]
Author
X. Rodet, Y. Potard, J.--B. Barrière
TitleThe Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General
JournalComputer Music Journal
MonthFall
Year1984


ARTICLEchant2 [RPB85]
Author
Xavier Rodet, Yves Potard, Jean--Baptiste Barrière
TitleCHANT: de la synthèse de la voix chantée à la synthèse en général
JournalRapports de recherche IRCAM
AddressParis
Year1985
NoteAvailable online3


MANUALchant-manual [Vir97]
Author
Dominique Virolle
TitleLa Librairie CHANT: Manuel d'utilisation des fonctions en C
MonthApril
Year1997
NoteAvailable online4


INPROC.dcep1 [GR90]
Author
Thierry Galas, Xavier Rodet
TitleAn Improved Cepstral Method for Deconvolution of Source--Filter Systems with Discrete Spectra: Application to Musical Sound Signals
BooktitleProceedings of the International Computer Music Conference (ICMC)
AddressGlasgow
MonthSeptember
Year1990
Notesdcep with cloud, some pictures, middle (3 pages)


INPROC.dcep2 [GR91b]
Author
Thierry Galas, Xavier Rodet
TitleGeneralized Discrete Cepstral Analysis for Deconvolution of Source--Filter Systems with Discrete Spectra
BooktitleIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
AddressNew Paltz, New York
MonthOctober
Year1991
Notesdcep with cloud, no pictures, short (2 pages)


INPROC.dcep3 [GR91c]
Author
Thierry Galas, Xavier Rodet
TitleGeneralized Functional Approximation for Source--Filter System Modeling
BooktitleProc. Eurospeech
AddressGeneve
Year1991
Pages1085--1088
Notespower spectrum modeling, all pole, dcep with cloud, log frequency, many pictures


INPROC.dcep3-short [GR91a]
Author
Th. Galas, X. Rodet
TitleGeneralized Functional Approximation for Source--Filter System Modeling
BooktitleProc. Eurospeech
Year1991


INPROC.marine1 [OCM97]
Author
M. Oudot, O. Cappé, E. Moulines
TitleRobust Estimation of the Spectral Envelope for ``Harmonics+Noise'' Models
BooktitleIEEE Workshop on Speech coding
AddressPocono Manor
MonthSeptember
Year1997


INPROC.marine97 [COM97]
Author
O. Cappé, M. Oudot, E. Moulines
TitleSpectral Envelope Estimation using a Penalized Likelihood Criterion
BooktitleIEEE ASSP Workshop on App. of Sig. Proc. to Audio and Acoust.
AddressMohonk
MonthOctober
Year1997


ARTICLEdcep-reg [CM96]
Author
O. Cappé, E. Moulines
TitleRegularization Techniques for Discrete Cepstrum Estimation
JournalIEEE Signal Processing Letters
Volume3
Number4
Pages100--102
MonthApril
Year1996


INPROC.xspect [RFL96]
Author
Xavier Rodet, Dominique François, Guillaume Levy
TitleXspect: a New Motif Signal Visualisation, Analysis and Editing Program
BooktitleProceedings of the International Computer Music Conference (ICMC)
LocationHong Kong
MonthAugust
Year1996
NoteAvailable online5


MANUALxspect-manual [RF96]
Author
Xavier Rodet, Dominique François
TitleXSPECT: Introduction
MonthJanuary
Year1996
NoteAvailable online6


INPROC.hmm [DGR93a]
Author
Ph. Depalle, G. Garcia, X. Rodet
TitleTracking of Partials for Additive Sound Synthesis Using Hidden Markov Models
NoteAbstract7
Pages225--228
BooktitleIEEE Trans.
Year1993
MonthApril


INPROC.hmm-short [DGR93b]
Author
Ph. Depalle, G. Garcia, X. Rodet
TitleTracking of Partials for Additive Sound Synthesis Using Hidden Markov Models
Pages225--228
BooktitleIEEE Trans.
Year1993


INPROC.additive [Rod97b]
Author
Xavier Rodet
TitleMusical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models
BooktitleProceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS)
MonthAugust
Year1997
NoteAbstract8, PostScript9 www.ircam.fr/anasyn/listePublications/articlesRodet/TFTS97/TFTS97.ps.gz


INPROC.additive-short [Rod97a]
Author
X. Rodet
TitleMusical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models
BooktitleProc. IEEE Time--Frequency/Time--Scale Workshop
Year1997


MANUALadditive-manual [Rod97c]
Author
Xavier Rodet
TitleThe Additive Analysis--Synthesis Package
Year1997
NoteAvailable online10


INPROC.diphones [RL97b]
Author
Xavier Rodet, Adrien Lefèvre
TitleThe Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use
BooktitleProceedings of the International Computer Music Conference (ICMC)
MonthSeptember
Year1997
AddressTessaloniki, Greece
NoteAbstract11, PostScript12 www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97Diphone.ps.gz


INPROC.diphones-nourl [RL97c]
Author
Xavier Rodet, Adrien Lefèvre
TitleThe Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use
BooktitleProceedings of the International Computer Music Conference (ICMC)
MonthSeptember
Year1997
AddressTessaloniki, Greece
abstract-urlhttp://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97DiphoneAbstract.html
postscript-urlhttp://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97Diphone.ps.gz


INPROC.diphones-short [RL97a]
Author
X. Rodet, A. Lefèvre
TitleThe Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use
BooktitleProc. ICMC
AddressTessaloniki
Year1997


INPROC.fft-1 [RD92]
Author
Xavier Rodet, Phillipe Depalle
TitleA new additive synthesis method using inverse Fourier transform and spectral envelopes
BooktitleProceedings of the International Computer Music Conference (ICMC)
MonthOctober
Year1992


MANUALsdif-manual [Vir98]
Author
Dominique Virolle
TitleSound Description Interchange Format (SDIF)
MonthJanuary
Year1998
NoteAvailable online13


INPROC.fts [DDPZ94]
Author
François Dechelle, Maurizio DeCecco, Miller Puckette, David Zicarelli
TitleThe IRCAM ``Real-Time Platform'': Evolution and Perspectives
BooktitleProceedings of the International Computer Music Conference (ICMC)
LocationAarhus, Danemark
Year1994
NoteAvailable online14


ARTICLEfts-basics [Puc91b]
Author
Miller Puckette
TitleFTS: A Real-Time Monitor for Multiprocessor Music Synthesis
JournalComputer Music Journal
Volume15
Number3
Pages58--67
MonthWinter
Year1991
NoteAvailable from15


ARTICLEmax [Puc91a]
Author
Miller Puckette
TitleCombining Event and Signal Processing in the MAX Graphical Programming Environment
JournalComputer Music Journal
Volume15
Number3
Pages68--77
MonthWinter
Year1991
NoteAvailable from16


INPROC.specenv-rod [RDP87b]
Author
Xavier Rodet, Phillipe Depalle, G. Poirot
TitleSpeech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions
BooktitleEuropean Conference on Speech Tech.
LocationEdinburgh
MonthSeptember
Year1987


INPROC.specenv-rod-short [RDP87a]
Author
X. Rodet, Ph. Depalle, G. Poirot
TitleSpeech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions
BooktitleEuropean Conf. on Speech Tech.
LocationEdinburgh
Year1987


INPROC.control [FRD92b]
Author
Adrian Freed, Xavier Rodet, Phillipe Depalle
TitleSynthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware
BooktitleICSPAT
LocationSan José
Year1992
NoteAvailable online17
Notesfft-1, fm, se better than BPF


INPROC.control-short [FRD92a]
Author
A. Freed, X. Rodet, Ph. Depalle
TitleSynthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware
BooktitleICSPAT
Year1992


INPROC.newposs [RDG95]
Author
Xavier Rodet, Philippe Depalle, Guillermo García
TitleNew Possibilities in Sound Analysis and Synthesis
BooktitleISMA
LocationDourdan
Year1995
NoteAvailable online18 PostScript19
Notesfft-1 + se, phys. models, ana/syn overview, farinelli


INPROC.farinelli [DGR94]
Author
Philippe Depalle, Guillermo García, Xavier Rodet
TitleA Virtual Castrato (!?)
BooktitleProceedings of the International Computer Music Conference (ICMC)
LocationAarhus, Danemark
Year1994
NoteAvailable online20


MANUALudi [WRD92]
Author
Peter Wyngaard, Chris Rogers, Philippe Depalle
TitleUDI 2.1---A Unified DSP Interface
Year1992
NoteAvailable online21


MANUALpm [Gar94]
Author
Guillermo García
TitlePm: A library for additive analysis/transformation/synthesis
MonthJuly
Year1994
NoteAvailable online22


INPROC.escher [WSR98]
Author
Marcelo M. Wanderley, Norbert Schnell, Joseph Rovan
TitleESCHER---Modeling and Performing composed Instruments in real-time
BooktitleIEEE Systems, Man, and Cybernetics Conference
LocationSan Diego
MonthOctober
Year1998
NoteTo be published


BOOKnat [Hen98]
Author
Nathalie Henrich
TitleSynthèse de la voix chantée par règles
MonthJuly
Year1998
PublisherIRCAM
AddressParis, France
NoteRapport de stage D.E.A. Acoustique, Traitement de Signal et Informatique Appliqués à la Musique


MISCz [Mel97]
Author
Jason Meldrum
TitleThe Z--Transform
NoteOnline tutorial23
Year1997


BOOKdsp [OS75]
Author
Alan V. Oppenheim, Ronald W. Schafer
TitleDigital Signal Processing
Year1975
PublisherPrentice--Hall


INBOOKdspapp [Opp78]
Editor
Alan V. Oppenheim
ChapterDigital Processing of Speech
TitleApplications of Digital Signal Processing
Pages117--168
Year1978
PublisherPrentice--Hall


BOOKdsp-intro [RH91]
Author
Stuart Rosen, Peter Howell
TitleSignals and Systems for Speech and Hearing
Year1991
PublisherAcademic Press
AddressLondon


BOOKroads [Roa96]
Author
Curtis Roads
TitleThe Computer Music Tutorial
Year1996
PublisherMIT Press


BOOKgrey80 [MG80]
Author
J.D. Markel, A.H. Gray
TitleLinear Prediction of Speech
PublisherSpringer
Year1980


INPROC.toeplitz [MP82]
Author
G. A. Merchant, T. W. Parks
TitleEfficient Solution of a Toeplitz--plus Hankel Coefficient Matrix System of Equations
BooktitleIEEE TASSP
Volume30
Pages40--44
MonthFebruary
Year1982


BOOKpsycho [Zwi82]
Author
Eberhard Zwicker
TitlePsychoakustik
Year1982
PublisherSpringer


INPROC.splinelpc [TAW97]
Author
Keith A. Teague, Walter Andrews, Buddy Walls
TitleEnhanced Modeling of Discrete Spectral Amplitudes
BooktitleIEEE Workshop on Speech coding
AddressPocono Manor
MonthSeptember
Year1997


INCOLL.ICS94 [vS94]
Author
R. von Sachs
TitlePeak-insensitive non-parametric spectrum estimation
BooktitleJournal of time series analysis
Year1994
Volume15
Number4
Pages429--452


ARTICLEadditive-idea [RM69]
Author
J.C. Risset, M.V. Mathews
TitleAnalysis of musical-instrument tones
JournalPhysics Today
Volume22
Number2
Pages23--30
MonthFebruary
Year1969


INPROC.splines [UAE93]
Author
Michael Unser, Akram Aldroubi, Murray Eden
TitleB--Spline Signal Processing: Part I---Theory
Volume41
Optnumber2
Pages821--833
BooktitleIEEE Transactions on signal processing
Year1993


MISCspeechana [Rob98]
Author
Tony Robinson
TitleSpeech Analysis
NoteOnline tutorial24
Year1998


ARTICLEMultiscaleEdges [MZ92]
Author
S. Mallat, S. Zhong
TitleCharacterization of Signals from Multiscale Edges
JournalIEEE Trans. Pattern Anal. Machine Intell.
Year1992
Volume40
Number7
Pages2464--2482
MonthJuly


ARTICLERidges [DEG+92]
Author
N. Delprat, B. Escudié, P. Guillemain, R. Kronland-Martinet, Ph. Tchamitchian, B. Torrésani
TitleAsymptotic Wavelet and Gabor Analysis : Extraction of Instantaneous Frequency
Year1992
Volume38
Number2
Pages644--664
MonthMarch


ARTICLERidges2 [GKM96]
Author
Ph. Guillemain, R. Kronland-Martinet
TitleCharacterization of Acoustic Signals Through Continuous Linear Time--Frequency Representations
Year1996
Volume84
Number4
Pages561--585
MonthApril


BOOKmallat [Mal97]
Author
Stephane Mallat
TitleA Wavelet Tour of Signal Processing
PublisherAP Professional
AddressLondon
Year1997


BOOKchan [Cha95]
Author
Y. T. Chan
TitleWavelet Basics
PublisherKluwer Academic Publ.
AddressBoston
Year1995


BOOKwavelets [Hub97]
Author
Barbara Burke Hubbard
TitleThe World According to Wavelets: The Story of a Mathematical Technique in the Making
PublisherA K Peters Ltd
Year1997


INBOOKIBspline [AE]
Author
Aldroubi, Eden
TitleWavelet analysis and its applications
ChapterPolynomial Spline and Wavelets
Publisher???
Year???
Volume2


BOOKinstrument-character [vH54]
Author
Hermann L. von Helmholtz
TitleOn the Sensations of Tone as a Physiological Basis for the Theory of Music
PublisherDover
AddressNew York
Year1954
NoteOriginal title: [vH13]


BOOKhelmholtz [vH13]
Author
Hermann L. von Helmholtz
TitleDie Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik
PublisherVieweg
AddressBraunschweig
Edition6th
Year1913


BOOKhelmholtz-reprint [vH83]
Author
Hermann L. von Helmholtz
TitleDie Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik
PublisherGeorg Olms Verlag
AddressHildesheim
Year1983


BOOKclark-yallop [CY96]
Author
John E. Clark, Colin Yallop
TitleAn Introduction to Phonetics and Phonology
PublisherBlackwell
AddressOxford
Year1996


ARTICLEprosody-tilt [Dog95]
Author
Grzegorz Dogil
TitlePhonetic Correlates of Word Stress
JournalAIMS Phonetik (Working Papers of the Department of Natural Language Processing)
Volume2
Number2
PublisherInstitut für Maschinelle Sprachverarbeitung
LocationStuttgart, Germany
AddressStuttgart, Germany
Year1995
NoteContents25


BOOKjackson1 [Jac95a]
Author
Michael Jackson
TitleSoftware requirements & specifications : a lexicon of practice, principles, and prejudices
PublisherAddison--Wesley
AddressWokingham
Year1995


BOOKjackson2 [Jac83]
Author
Michael A. Jackson
TitleSystem development
PublisherPrentice--Hall Intern.
AddressEnglewood Cliffs
Year1983
SeriesPrentice--Hall International series in computer science


BOOKnagl [Nag90]
Author
Manfred Nagl
TitleSoftwaretechnik: methodisches Programmieren im Großen
PublisherSpringer
AddressBerlin
Year1990
SeriesSpringer compass


BOOKsommerville [Som85]
Author
Ian Sommerville
TitleSoftware engineering
Edition2nd
PublisherAddison--Wesley
AddressWokingham [u.a.]
Year1985
SeriesInternational computer science series


BOOKiau [Utt93]
Author
Ian A. Utting
TitleLecture Notes in Object-Oriented Software Engineering
PublisherUniversity of Kent at Canterbury
AddressCanterbury, UK
Year1993


12   Statistics

ARTICLEbattiti94 [Bat94]
Author
Roberto Battiti
TitleUsing the mutual information for selecting features in supervised neural net learning
JournalIEEE Transactions on Neural Networks
Volume5
Number4
Pages537--550
Year1994
urlhttp://rtm.science.unitn.it/~battiti/battiti-publications.html


BOOKcart84 [BFOS84a]
Author
L. Breiman, J. Friedman, R. Olshen, C. Stone
TitleClassification and Regression Trees
PublisherWadsworth and Brooks
AddressMonterey, CA
Year1984
Notenew edition [B+84]?
Remarkscited in [MCW98, CM98, BT97b] for CART, clustering, and decision trees


BOOKcart84-2 [BFOS84b]
Author
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone
TitleClassification and Regression Trees
Year1984
PublisherWadsworth Publishing Company
AddressBelmont, California, U.S.A.
SeriesStatistics/Probability Series
Isbn-hard0534980546 (softcover)
Isbn-soft0534980538 (hardcover)


BOOKcart93 [B+84]
Author
Leo Breiman, others
TitleClassification and Regression Trees
PublisherChapman & Hall
AddressNew York
Year1984
Pages358
Notenew edition of [BFOS84a]?
Isbn0-412-04841-8
urlhttp://www.crcpress.com/catalog/C4841.htm
amazon-urlhttp://www.amazon.de/exec/obidos/ASIN/0412048418
Price$44.95, DM 83.26 EUR 42.57
RemarksTO BE FOUND


ARTICLEdubnov95 [DTC]
Author
Shlomo Dubnov, Naftali Tishby, Dalia Cohen
TitleHearing Beyond the Spectrum
JournalJournal of New Music Research
Volume24
Number4
pub-urlhttp://www.swets.nl/jnmr/vol24_4.html#dubnov24.4
Remarksfeatures: harmonicity, phase coherence, chorus. bispectral information. acoustic distortion (distance) measure (``concept of statistical divergence which is used for measuring the `similarity' between signals'', ``similarity classes with a good correspondence to the human acoustic perception'', ``generalization of acoustic distortion measure''). TO BE FOUND
AbstractIn this work we focus on the problem of acoustic signals modeling and analysis, with particular interest in models that can capture the timbre of musical sounds. Traditional methods usually relate to several ``dimensions'' which represent the spectral properties of the signal and their change in time. Here we confine ourselves to the stationary portion of the sound signal, the analysis of which is generalized by incorporating polyspectral techniques. We suggest that by looking at the higher order statistics of the signal we obtain additional information not present in the standard autocorrelation or its Fourier related power-spectra. It is shown that over the bispectral plane several acoustically meaningful measures could be devised, which are sensitive to properties such as harmonicity and phase coherence among the harmonics. Effects such as reverberation and chorusing are demonstrated to be clearly detected by the above measures. In the second part of the paper we perform an information theoretic analysis of the spectral and bispectral planes. We introduce the concept of statistical divergence which is used for measuring the ``similarity'' between signals. A comparative matrix is presented which shows the similarity measure between several instruments based on spectral and bispectral information. The instruments group into similarity classes with a good correspondence to the human acoustic perception. The last part of the paper is devoted to acoustical modelling of the above phenomena. We suggest a simple model which accounts for some of the polyspectral aspects of musical sound discussed above. One of the main results of our work is generalization of acoustic distortion measure based on our model and which takes into account higher order statistical properties of the signal.


INPROC.dubnov97 [DR97]
Author
Shlomo Dubnov, Xavier Rodet
TitleStatistical Modeling of Sound Aperiodicities
BooktitleProceedings of the International Computer Music Conference (ICMC)
MonthSeptember
Year1997
AddressTessaloniki, Greece
urlhttp://www.ircam.fr/equipes/analyse-synthese/listePublications/articlesDubnov


PHDTHESISrochebois97 [Roc97]
Author
Thierry Rochebois
TitleMéthodes d'analyse synthèse et représentations optimales des sons musicaux basées sur la réduction de données spectrales
MonthDecember
Year1997
SchoolUniversité Paris XI
urlhttp://www.ief.u-psud.fr/~thierry/these/
RemarksPrincipal components analysis of harmonic partials, gives sub-spaces as linear combinations of partials, i.e. timbral components.
AbstractprincipalL'analyse et la synthèse de sons et en particulier de sons musicaux a déjà fait l'objet de nombreuses recherches. Pour l'essentiel, ces recherches ont été menées dans deux objectifs : étudier et synthétiser les sons musicaux. Ces deux objectifs sont tout à fait conciliables et complémentaires. L'objet de cette thèse est une méthode d'analyse et de synthèse des sons musicaux basée sur une réduction de données. Une telle méthode permet d'obtenir une représentation optimale - au sens de la variance - des sons musicaux. Cette représentation est, à la fois un puissant outil pour l'étude du timbre musical, mais aussi, la base d'une forme de synthèse efficace.


BOOKfukunaga90 [Fuk90]
Author
K. Fukunaga
TitleIntroduction to Statistical Pattern Recognition
PublisherAcademic Press
Edition2
Year1990
Remarkscited in [CM98] for CART tree evaluation criterion. TO BE FOUND


INPROC.nock97 [NGY97]
Author
H. J. Nock, M. J. F. Gales, Steve Young
TitleA Comparative Study of Methods for Phonetic Decision-Tree State Clustering
BooktitleProc. Eurospeech '97
Volume1
AddressRhodes, Greece
MonthSeptember
Year1997
Pages111--114
Remarkscited in [MCW98] for decision trees for speech recognition, [CM98] for CART tree evaluation criterion. TO BE FOUND


13   TCTS Circuit Theory and Signal Processing Lab

MISCtcts:www [TCTS99]
KeyTCTS
TitleTCTS (Circuit Theory and Signal Processing) Lab, Faculté Polytechnique de Mons
HowpublishedWWW page
Year1999
urlhttp://tcts.fpms.ac.be
group-urlhttp://tcts.fpms.ac.be/synthesis/synthesis.html
pub-urlhttp://tcts.fpms.ac.be/publications.html
Notehttp://tcts.fpms.ac.be


INPROC.tcts:euspico98 [DMD98]
Author
O. Deroo, F. Malfrere, T. Dutoit
TitleComparaison of two different alignment systems: speech synthesis vs. Hybrid HMM/ANN
BooktitleProc. European Conference on Signal Processing (EUSIPCO'98)
AddressGreece
Year1998
Pages1161--1164
Notewww [TCTS99], same content as [MDD98] (but less references)
urlhttp://tcts.fpms.ac.be/publications/papers/1998/eusipco98_odfmtd.zip
AbstractIn this paper we compared two different methods for phonetically labeling a French database. The first one is based on the temporal alignment of the speech signal on a high quality synthetic speech pattern and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a single speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outline the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage (hence no labeled database) is needed, while the classical HMM/ANN system allows easily multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic constitution of large phonetically and prosodically labeled speech databases based on using the synthetic speech segmentation tool in order to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems. All the experiments reported in this article related to the hybrid HMM/ANN system have been realized with the STRUT [3] software.


INPROC.tcts:tsd98 [DMP+98]
TitleEULER: Multi-Lingual Text-to-Speech Project
Pages27--32
Author
T. Dutoit, F. Malfrère, V. Pagel, M. Bagein P. Mertens, A. Ruelle, A. Gilman
BooktitleProceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98
Year1998
Editor
Petr Sojka, Václav Matousek, Karel Pala, Ivan Kopecek
AddressBrno, Czech Republic
MonthSeptember
PublisherMasaryk University Press
Notewww [TCTS99]Electronic version: tcts/tsd98tdfmvppmmbarag.ps.*
Remarksmodularity
AbstractText-to-speech systems requires simultaneously an abstract linguistic analysis, an acoustic linguistic analysis and a final digital processing stage. The aim of the project presented in this paper is to obtain a set of text-to-speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. This project is an extension of the MBROLA projects. MBROLA is a speech synthesizer that is freely distributed for non-commercial purposes. A multi-lingual speech segmentation and prosody transplantation tool called MBROLIGN has also been developed and freely distributed. Other labs have also recently distributed for free important tools for speech synthesis like Festival from University o f Edinburgh or the MULTEXT project of the University de Provence. The purpose of this paper is to present the EULER project, which will try to integrate all these results, to Eastern European potential partners, so as to increase the dissemination of the important results of MBROLA and MBROLIGN projects and stimulate East/West collaboration on TTS synthesis.


INPROC.tcts:icslp98-fmodtd [MDD98]
Author
F. Malfrere, O. Deroo, T. Dutoit
TitlePhonetic Alignement : Speech Synthesis Based Vs. Hybrid HMM/ANN
BooktitleProc. International Conference on Speech and Language Processing
AddressSidney, Australia
Year1998
Pages1571--1574
Notewww [TCTS99], same content as [DMD98] (with more references)
urlhttp://tcts.fpms.ac.be/publications/papers/1998/icslp98_fmodtd.zip
AbstractIn this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.


INPROC.tcts:iscas97 [MD97a]
Author
F. Malfrere, T. Dutoit
TitleSpeech Synthesis for Text-To-Speech Alignment and Prosodic Feature Extraction
BooktitleProc. ISCAS 97
AddressHong-Kong
Year1997
Pages2637--2640
Notewww [TCTS99]
urlhttp://tcts.fpms.ac.be/publications/papers/1997/iscas97_fmtd.zip
RemarksRecent developments in prosody generation have highlighted the potential interest of machine learning techniques such as multilayer perceptrons [Tra92], linear regression techniques [SK92], classification and regression trees [Hir91], or statistical techniques [MPH93], based on the automatic analysis of large prosodically labeled corpora. Only the segmental features of the reference signal used in alignment. Assumption: the segmental and suprasegmental features are approximately uncorrelated. Keep only the perceptually relevant F0 cues, perceptual stylization, based on a model of tonal perception [alessandro95]. Robust cepstrum by sinusoidal weighting [GL88]. Derivative of cepstrum [SR88].
AbstractThe aim of this paper is to present a new and promising approach of the text--to--speech alignment problem. For this purpose, an original idea is developed : a high quality digital speech synthesizer is used to create a reference speech pattern used during the alignment process. The system has been used and tested to extract the prosodic features of read French utterances. The results show a segmentation error rate of about 8%. This system will be a powerful tool for the automatic creation of large prosodically labeled databases and for research on automatic prosody generation.


INPROC.tcts:eurosp97 [SDS97]
Author
Yannis Stylianou, Thierry Dutoit, Juergen Schroeter
TitleDiphone Concatenation Using a Harmonic Plus Noise Model of Speech
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages613--616
Notewww [TCTS99]Electronic version: tcts/hnmconc.ps.*
RemarksImportant! HNM (Marine) basis paper, pitch synchronous. Diphone smoothing in region of quasi-stationarity. Additive better for concatenation than PSOLA. References: [DG96] (non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation), [SLM95] (phase treatment, modifications), [Mac96] (non pitch synchronous harmonic modeling).
AbstractIn this paper we present a high-quality text-to-speech system using diphones. The system is based on a Harmonic plus Noise (HNM) representation of the speech signal. HNM is a pitch-synchronous analysis-synthesis system but does not require pitch marks to be determined as necessary in PSOLA-based methods. HNM assumes the speech signal to be composed of a periodic part and a stochastic part. As a result, different prosody and spectral envelope modification methods can be applied to each part, yielding more natural-sounding synthetic speech. The fully parametric representation of speech using HNM also provides a straightforward way of smoothing diphone boundaries. Informal listening tests, using natural prosody, have shown that the synthetic speech quality is close to the quality of the original sentences, without smoothing problems and without buzziness or other oddities observed with other speech representations used for TTS.


INPROC.tcts:speechcomm96 [DG96]
Author
T. Dutoit, B. Gosselin
TitleOn the use of a hybrid harmonic/stochastic model for tts synthesis by concatenation
BooktitleSpeech Communication
Number19
Pages119--143
Year1996
RemarksCited in [SDS97] for non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation. TO BE FOUND


INPROC.macon-thesis96 [Mac96]
Author
Michael W. Macon
TitleSpeech Synthesis Based on Sinusoidal Modeling
BooktitlePhD thesis
PublisherGeorgia Institute of Technology
MonthOctober
Year1996
RemarksCited in [SDS97] for non pitch synchronous harmonic modeling. TO BE FOUND


INPROC.stylianou:eurospeech95 [SLM95]
Author
Y. Stylianou, J. Laroche, E. Moulines
TitleHigh Quality Speech Modification based on a Harmonic+Noise Model
BooktitleProc. EUROSPEECH
Year1995
RemarksCited in [SDS97] for phase treatment, modifications, maximum voice frequency. TO BE FOUND


INPROC.Malfrere_HighQual_EURO97 [MD97b]
Author
Fabrice Malfrere, Thierry Dutoit
TitleHigh Quality Speech Synthesis for Phonetic Speech Segmentation
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages2631--2634


INPROC.Olivier_SimpAnd_EURO97 [vdVOPD+97]
Author
van der Vrecken Olivier, Nicolas Pierret, Thierry Dutoit, Vincent Pagel, Fabrice Malfrere
TitleA Simple and Efficient Algorithm for the Compression of MBROLA Segment Databases
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages421--424


INPROC.Dutoit_TheMbro_ICSLP96 [DPP+96]
Author
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. V. der Vrecken
TitleThe MBROLA project: Towards a Set of High Quality Speech Synthesizers Free of Use for Non Commercial Purposes
BooktitleProc. ICSLP '96
AddressPhiladelphia, PA
MonthOctober
Year1996
Volume3
Pages1393--1396


INPROC.Dutoit_HighQual_ICASSP94 [Dut94]
Author
T. Dutoit
TitleHigh Quality Text-to-Speech Synthesis: a Comparison of four Candidate Algorithms
BooktitleProc. ICASSP '94
AddressAdelaide, Austrailia
MonthApril
Year1994
PagesI--565--I--568


14   Misc

MISCMPEG7:www [MPEG99]
KeyMPEG
TitleMPEG-7 ``Multimedia Content Description Interface'' Documentation
HowpublishedWWW page
Year1999
urlhttp://www.darmstadt.gmd.de/mobile/MPEG7
Notehttp://www.darmstadt.gmd.de/mobile/MPEG7
AbstractMore and more audio-visual information is available in digital form, in various places around the world. Along with the information, people appear that want to use it. Before one can use any information, however, it will have to be located first. At the same time, the increasing availability of potentially interesting material makes this search harder. The question of finding content is not restricted to database retrieval applications; also in other areas similar questions exist. For instance, there is an increasing amount of (digital) broadcast channels available, and this makes it harder to select the broadcast channel (radio or TV) that is potentially interesting.

In October 1996, MPEG (Moving Picture Experts Group) started a new work item to provide a solution for the urging problem of generally recognised descriptions for audio-visual content, which extend the limited capabilities of proprietary solutions in identifying content that exist today. The new member of the MPEG family is called ``Multimedia Content Description Interface'', or in short MPEG-7.

The associated pages presented in the navigation tool shall provide you with the necessary information to learn more about MPEG-7. As MPEG in general is a dynamic and fast moving standardisation body, some documents and related information may be outdated quickly. We will make every effort to keep up with the MPEG pace - however, keep in mind that the Webpages may not always contain the newest information.


MISCMPEG7:audio-faq [Lin98]
Author
Adam Lindsay
TitleMPEG-7 Audio FAQ
HowpublishedWWW page
Year1998
urlhttp://www.meta-labs.com/mpeg-7/MPEG-7-aud-FAQ.shtml
parent-urlhttp://www.meta-labs.com/mpeg-7-aud/
Notemoved to [TPMAS98]
AbstractThe following is an unofficial FAQ for MPEG-7 Audio issues. It is not a complete document, and is intended to act as a supplement to the FAQ found in the MPEG-7 Context & Objectives document, N2326.
RemarksWhat are specific functionalities forseen for MPEG-7 audio?
Although still an expanding list, we can envision indexing music, sound effects, and spoken-word content in the audio-only arena. MPEG-7 will enable query-by-example such as query-by-humming. In addition, audio tools play a large role in typical audio-visual content in terms of indexing film soundtracks and the like. If someone wants to manage a large amount of audio content, whether selling it, managing it internally, or making it openly available to the world, MPEG-7 is potentially the solution.

What are the forseen elements of MPEG-7?
MPEG-7 work is currently seen as being in three parts: Descriptors (D's), Description Schemes (DS's), and a Description Definition Language (DDL). Each is equally crucial to the entire MPEG-7 effort.

Descriptors are the representations of low-level features, the fundamental qualities of audiovisual content which may range from statistical models of signal amplitude, to fundamental frequency of a signal, to an estimate of the number of sources present in a signal, to spectral tilt, to emotional content, to an explicit sound-effect model, to any number of concrete or abstract features. This is the place where the most involvement from the signal processing community is forseen. Note that not all of the descriptors need to be automatically extracted--the essential part of the standard is to establish a normalized representation and interpretation of the Descriptor. We are actively seeking input on what additional potential Descriptors would be useful.

Description Schemes are structured combinations of Descriptors. This structure may be used to annotate a document, to directly express the structure of a document, or to create combinations of features which form a richer expression of a higher-level concept. For example, a radio segment DS may note the recording date, the broadcast date, the producer, the talent, and include pointers to a transcript. A classical music DS may encode the musical structures (and allow for exceptions) of a Sonata form. Various spectral and temporal Descriptors may be combined to form a DS appropriate for describing timbre or short sound effects. Any suggestions on other applications of DS's to Audio material are very welcome.

The Description Definition Language is to be the mechanism which allows a great degreed flexibility to be included in MPEG-7. Not all documents will fit into a prescribed structure. There are fields (e.g. biomedical imagery) which would find the MPEG-7 framework very useful, but which lie outside of MPEG's scope. A solution provider may have a better method for combining MPEG-7 Descriptors than a normative description scheme. The DDL is to address all of these situations.

While MPEG-4 seeks to have a unique and faithful reproduction of material, MPEG-7 foregoes some precision for the sake of identifying the "essential" features of the material (although many different representations are possible of the same material). What distinguishes it most from other material? What makes it similar?


MISCMPEG:audio-faq [TPMAS98]
Author
D. Thom, H. Purnhagen, the MPEG Audio Subgroup
TitleMPEG Audio FAQ Version 9
HowpublishedWWW page
Year1998
MonthOctober
AddressAtlantic City
urlhttp://www.tnt.uni-hannover.de/project/mpeg/audio/faq
NoteInternational Organisation for Standardisation, Organisation Internationale de Normalisation, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, N2431, http://www.tnt.uni-hannover.de/project/mpeg/audio/faq


PHDTHESISlevine:thesis [Lev98]
Author
Scott N. Levine
TitleAudio Representations for Data Compression and Compressed Domain Processing
TypePh.D. Dissertation
SchoolDepartment of Electrical Engineering, CCRMA, Stanford University
MonthDecember
Year1998
urlhttp://www-ccrma.stanford.edu/~scottl/thesis.html
Notehttp://www-ccrma.stanford.edu/~scottl/thesis.html
AbstractIn the world of digital audio processing, one usually has the choice of performing modifications on the raw audio signal, or data compressing the audio signal. But, performing modifications on a data compressed audio signal has proved difficult in the past. This thesis provides a new representation of audio signals that allows for both very low bit rate audio data compression and high quality compressed domain processing and modifications. In this context, processing possibilities are time-scale and pitch-scale modifications. This new audio representation segments the audio into separate sinusoidal, transients, and noise signals. During determined attack transients regions, the audio is modeled by well established transform coding techniques. During the remaining non-transient regions of the input, the audio is modeled by a mixture of multiresolution sinusoidal modeling and noise modeling. Careful phase locking techniques at the time boundaries between the sines and transients allow for seamless transitions between representations. By separating the audio into three individual representations, each can be efficiently and perceptually quantized.


MISCplunderphonics [Osw99]
Author
John Oswald
TitlePlunderphonics
HowpublishedWWW page
Year1999
urlhttp://www.interlog.com/~vacuvox/
Notehttp://www.6q.com, esp. [Osw93]


MISCplexure [Osw93]
Author
John Oswald
TitlePlexure
HowpublishedCD
Year1993
urlhttp://www.interlog.com/xdiscography.html#plexure
Notehttp://www.interlog.com/~vacuvox/xdiscography.html#plexure
AbstractPublished by Disk Union Japan (on CD only), it should be in stores but is often hard to find or expensive. It is currently availabe from WFMU who also provide a short sample (193K).Plundered are over a thousand pop stars from the past 10 years. Rather than crediting each individual artist or group as he did in the original plunderphonic release, Oswald chose instead to reference morphed artists of his own creation (Bonnie Ratt, etc) It starts with rapmillisylables and progresses through the material according to tempo (which has an interesting relationship with genre). Oswald used several mechanisms to generate the plunderphonemes that make up this encyclopaedic popologue. This is the most formidable of the plunderphonics projects to date.


MISCthelongestandmostharmlessentry [vdVdlLvdV48]
Author
Van van der Van, Dee de la La, Don von der Von
TitleThe Longest Bibliographic Reference
Year1848
RemarksThis is here so that the longest bibliography reference is this one, [vdVdlLvdV48], and not something with an et. al. symbol, because this confuses tth, the tex to html translator, too much.


15   Sound Sources

MISCberio91 [Ber91]
Author
Luciano Berio
TitleCircles; Sequenza I, III, V
HowpublishedMediathèque CD00008601
Year1991
urlhttp://mediatheque.ircam.fr/cgi-bin/archives?AFFICHAGE=long\&ID=CD00008601
NoteCathy Berberian (Stimme), Francis Pierre (Harfe), Jean-Pierre Drouet, Jean-Claude Casadesus (Schlagzeug), Aurèle Nicolet (Flöte), Vinko Globokar (Posaune)


16   To Read

INPROC.baudoin:eurospeech:97 [BCC97]
Author
G. Baudoin, J. Cernocký, G. Chollet
TitleQuantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate
BooktitleProc. EUROSPEECH 97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages1295--1298
abstract-urlhttp://www.wcl2.ee.upatras.gr/eurtad.html#link1295
AbstractThis paper deals with the coding of spectral envelope parameters for very low bit rate speech coding (inferior to 500 bps). In order to obtain a sufficient intelligibility, segmental techniques are necessary. Variable dimension vector quantization is one of these. We propose a new interpretation of already published research from Chou-Lockabaugh [2] and Cernocky- Baudoin-Chollet [4,6] on the quantization of variable length sequences of spectral vectors, named respectively Variable to Variable length Vector Quantization (VVVQ) and Multigrams Quantization (MGQ). This interpretation gives a meaning to the Lagrange multiplier used in the optimization criterion of the VVVQ, and should allow new developments as, for example, new modelization of the probability density of the source. We have also studied the influence of the limitation of the delay introduced by the method. It was found that a maximal delay of 400 ms is generally sufficient. Finally, we propose the introduction of long sequences in the segmental codebook by linear interpolation of shorter ones.


INPROC.Stylianou_DecoOf_ICSLP96 [Sty96]
Author
Y. Stylianou
TitleDecomposition of Speech Signals into a Deterministic and a Stochastic Part
BooktitleProc. ICSLP '96
AddressPhiladelphia, PA
MonthOctober
Year1996
Volume2
Pages1213--1216






References

[AADR98]
Carlos Agon, Gérard Assayag, Olivier Delerue, and Camilo Rueda. Objects, Time and Constraints in OpenMusic. In Proceedings of the International Computer Music Conference (ICMC), Ann Arbor, Michigan, October 1998.

[AAFH97]
Gérard Assayag, Carlos Agon, Joshua Fineberg, and Peter Hanappe. An Object Oriented Visual Environment For Musical Composition. In Proceedings of the International Computer Music Conference (ICMC), Thessaloniki, Greece, 1997.

[AAS00a]
G. Assayag, C. Agon, and M. Stroppa. High Level Musical Control of Sound Synthesis in OpenMusic. In Proc. ICMC, Berlin, 2000.

[AAS00b]
G. Assayag, C. Agon, and M. Stroppa. High Level Musical Control of Sound Synthesis in OpenMusic. In Proc. ICMC, 2000.

[AAS00c]
Gérard Assayag, Carlos Agon, and Marco Stroppa. High Level Musical Control of Sound Synthesis in OpenMusic. In Proceedings of the International Computer Music Conference (ICMC), Berlin, August 2000.

[AE]
Aldroubi and Eden. Wavelet analysis and its applications, volume 2, chapter Polynomial Spline and Wavelets. ???, ???

[APB+99]
Marc Abrams, Constantinos Phanouriou, Alan L. Batongbacal, Stephen M. Williams, and Jonathan E. Shuster. UIML: an appliance-independent XML user interface language. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1695--1708, May 1999.

[ARL+99a]
G. Assayag, C. Rueda, M. Laurson, C. Agon, and O. Delerue. Computer Assisted Composition at Ircam: PatchWork & OpenMusic. Computer Music Journal, 23(3), Fall 1999.

[ARL+99b]
Gérard Assayag, Camilo Rueda, Mikael Laurson, Carlos Agon, and O. Delerue. Computer Assisted Composition at Ircam: PatchWork & OpenMusic. Computer Music Journal, 23(3), 1999.

[AS99]
Analysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou. WWW page, 1999. http://www.ircam.fr/equipes/analyse-synthese/.

[AS00]
Analysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou. WWW page, 2000. http://www.ircam.fr/anasyn/.

[ASP99]
Anthropic Signal Processing Group, Oregon Graduate Institute of Science and Technology. WWW page, 1999. http://ece.ogi.edu/asp.

[ATT99]
AT&T Labs, Oregon Graduate Institute of Science and Technology. WWW page, 1999. http://www.research.att.com/projects/tts/.

[B+84]
Leo Breiman et al. Classification and Regression Trees. Chapman & Hall, New York, 1984. new edition of [BFOS84a]?

[Bat94]
Roberto Battiti. Using the mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4):537--550, 1994.

[BC95]
A. W. Black and N. Campbell. Optimising selection of units from speech databases for concatenative synthesis. In Proc. Eurospeech '95, volume 1, pages 581--584, Madrid, Spain, September 1995.

[BCC97]
G. Baudoin, J. Cernocký, and G. Chollet. Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate. In Proc. EUROSPEECH 97, pages 1295--1298, Rhodes, Greece, September 1997.

[BCS98]
Mark Beutnagel, Alistair Conkie, and Ann K. Syrdal. Diphone Synthesis using Unit Selection. In The 3rd ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Australia, November 1998. www [ATT99].

[BCS+99]
M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal. The AT&T Next-Gen TTS System. In Joint Meeting of ASA, EAA, and DAGA, Berlin, Germany, March 1999. www [ATT99].

[Bea93a]
J. W. Beauchamp. Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds. Proceedings of the Audio Engineering Society, 1993.

[Bea93b]
J. W. Beauchamp. Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds. In Proc. AES, 1993.

[Bea98]
James Beauchamp. Methods for measurement and manipulation of timbral physical correlates. 103(5):2966, 1998.

[Bea00]
James Beauchamp, editor. The Sound of Music. Springer, New York, 2000.

[Ber91]
Luciano Berio. Circles; sequenza i, iii, v. Mediathèque CD00008601, 1991. Cathy Berberian (Stimme), Francis Pierre (Harfe), Jean-Pierre Drouet, Jean-Claude Casadesus (Schlagzeug), Aurèle Nicolet (Flöte), Vinko Globokar (Posaune).

[BFOS84a]
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984. new edition [B+84]?

[BFOS84b]
Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont, California, U.S.A., 1984.

[BH]
James Beauchamp and A. Horner. Piecewise Linear Approximation of Additive Synthesis Envelopes: A Comparison of Various Methods. 20(2):72--95.

[BHM95]
James Beauchamp, A. Horner, and S. McAdams. Musical Sounds, Data Reduction, and Perceptual Control Parameters. In Program for SMPC95, Society for Music Perception and Cognition, pages 8--9, Univ. Calif. Berkeley, 1995. Center for New Music and Audio Technologies (CNMAT).

[BLS91]
G. Bailly, R. Laboissière, and J. L. Schwartz. Formant trajectories as audible gestures: an alternative for speech synthesis. Journal of Phonetics, 19:9--23, 1991.

[Boe89]
Barry W. Boehm. Software risk management. IEEE Computer Society Press, Washington, 1989.

[Boo94]
Grady Booch. Object-Oriented Analysis and Design with Applications. Benjamin--Cummings, Redwood City, Calif., 2nd edition, 1994.

[BT97a]
Alan Black and Paul Taylor. The Festival Speech Synthesis System: System Documentation (1.1.1). Technical Report HCRC/TR-83, Human Communication Research Centre, January 1997. www [CSTR99].

[BT97b]
Alan W Black and Paul Taylor. Automatically clustering similar units for unit selection in speech synthesis. In Proc. Eurospeech '97, pages 601--604, Rhodes, Greece, September 1997. www [CSTR99] Electronic version: cstr/Black_1997_b.*.

[BTC98]
Alan Black, Paul Taylor, and Richard Caley. The Festival Speech Synthesis System: System Documentation (1.3.1). Technical Report HCRC/TR-83, Human Communication Research Centre, December 1998. www [CSTR99].

[Cam96]
N. Campbell. CHATR: A high-definition speech re-sequencing system. Acoustical Society of America and Acoustical Society of Japan, Third Joint Meeting, December 1996.

[CFW00]
A. Chaudhary, A. Freed, and M. Wright. An Open Architecture for Real-time Music Software. In Proc. ICMC, Berlin, 2000.

[CH]
Ngai-Man Cheung and Andrew Horner. Group Synthesis with Genetic Algorithms. 44(3):130--147.

[CH68]
N. Chomsky and M. Halle. The Sound Pattern of English. Harper & Row, New York, NY, 1968.

[Cha95]
Y. T. Chan. Wavelet Basics. Kluwer Academic Publ., Boston, 1995.

[Cha98]
Arun Chandra. Compositional experiments with concatenating distinct waveform periods while changing their structural properties. In SEAMUS'98, Urbana, IL, April 1998. School of Music, University of Illinois. Available online26.

[Cha99]
Jean-Marie Chauvet. Composants et transactions: COMMTS, CorbaOTS, JavaEJB, XML. Collection dirigée par Guy Hervier. Eyrolles: Informatiques magazine, Paris, France, 1999.

[CM96]
O. Cappé and E. Moulines. Regularization Techniques for Discrete Cepstrum Estimation. IEEE Signal Processing Letters, 3(4):100--102, April 1996.

[CM98]
Andrew E. Cronk and Michael W. Macon. Optimized Stopping Criteria for Tree-Based Unit Selection in Concatenative Synthesis. In Proc. of International Conference on Spoken Language Processing, volume 5, pages 1951--1955, November 1998. www [CSLU99].

[COM97]
O. Cappé, M. Oudot, and E. Moulines. Spectral Envelope Estimation using a Penalized Likelihood Criterion. In IEEE ASSP Workshop on App. of Sig. Proc. to Audio and Acoust., Mohonk, October 1997.

[Cov00]
Robin Cover. The XML Cover Pages. WWW page, 2000. http://www.oasis-open.org/cover/xml.html.

[CSLU99]
CSLU Speech Synthesis Research Group, Oregon Graduate Institute of Science and Technology. WWW page, 1999. http://cslu.cse.ogi.edu/tts.

[CSTR99]
Centre for Speech Technology Research, University of Edinburgh. WWW page, 1999. http://www.cstr.ed.ac.uk/.

[CY96]
John E. Clark and Colin Yallop. An Introduction to Phonetics and Phonology. Blackwell, Oxford, 1996.

[CYDH97]
Nick Campbell, Itoh Yoshiharu, Wen Ding, and Norio Higuchi. Factors affecting perceived quality and intelligibility in the CHATR concatenative speech synthesiser. In Proc. Eurospeech '97, pages 2635--2638, Rhodes, Greece, September 1997.

[DC97]
Wen Ding and Nick Campbell. Optimising unit selection with voice source and formants in the CHATR speech synthesis system. In Proc. Eurospeech '97, pages 537--540, Rhodes, Greece, September 1997.

[DCMS99]
François Déchelle, Maurizio De Cecco, Enzo Maggi, and Norbert Schnell. jMax Recent Developments. In Proceedings of the International Computer Music Conference, 1999.

[DDMS99]
F. Déchelle, M. DeCecco, E. Maggi, and N. Schnell. jMax Recent Developments. In Proc. ICMC, 1999.

[DDPZ94]
François Dechelle, Maurizio DeCecco, Miller Puckette, and David Zicarelli. The IRCAM ``Real-Time Platform'': Evolution and Perspectives. In Proceedings of the International Computer Music Conference (ICMC), 1994. Available online27.

[DEG+92]
N. Delprat, B. Escudié, P. Guillemain, R. Kronland-Martinet, Ph. Tchamitchian, and B. Torrésani. Asymptotic wavelet and gabor analysis : Extraction of instantaneous frequency. 38(2):644--664, March 1992.

[DG96]
T. Dutoit and B. Gosselin. On the use of a hybrid harmonic/stochastic model for tts synthesis by concatenation. In Speech Communication, number 19, pages 119--143, 1996.

[DGR93a]
Ph. Depalle, G. Garcia, and X. Rodet. Tracking of Partials for Additive Sound Synthesis Using Hidden Markov Models. In IEEE Trans., pages 225--228, April 1993. Abstract28.

[DGR93b]
Ph. Depalle, G. Garcia, and X. Rodet. Tracking of Partials for Additive Sound Synthesis Using Hidden Markov Models. In IEEE Trans., pages 225--228, 1993.

[DGR94]
Philippe Depalle, Guillermo García, and Xavier Rodet. A Virtual Castrato (!?). In Proceedings of the International Computer Music Conference (ICMC), 1994. Available online29.

[dM95]
C. d'Alessandro and P. Mertens. Automatic pitch contour stylization using a model of tonal perception. In Computer Speech and Language, pages 257--288, 1995.

[DMD98]
O. Deroo, F. Malfrere, and T. Dutoit. Comparaison of two different alignment systems: speech synthesis vs. hybrid hmm/ann. In Proc. European Conference on Signal Processing (EUSIPCO'98), pages 1161--1164, Greece, 1998. www [TCTS99], same content as [MDD98] (but less references).

[DMP+98]
T. Dutoit, F. Malfrère, V. Pagel, M. Bagein P. Mertens, A. Ruelle, and A. Gilman. EULER: Multi-Lingual Text-to-Speech Project. In Petr Sojka, Václav Matousek, Karel Pala, and Ivan Kopecek, editors, Proceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98, pages 27--32, Brno, Czech Republic, September 1998. Masaryk University Press. www [TCTS99]Electronic version: tcts/tsd98_tdfmvppmmbarag.ps.*.

[Dog95]
Grzegorz Dogil. Phonetic correlates of word stress. AIMS Phonetik (Working Papers of the Department of Natural Language Processing), 2(2), 1995. Contents30.

[Don96]
R. E. Donovan. Trainable Speech Synthesis. Phd thesis, Cambridge University, 1996.

[DPP+96]
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, and O. V. der Vrecken. The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. In Proc. ICSLP '96, volume 3, pages 1393--1396, Philadelphia, PA, October 1996.

[DR97]
Shlomo Dubnov and Xavier Rodet. Statistical Modeling of Sound Aperiodicities. In Proceedings of the International Computer Music Conference (ICMC), Tessaloniki, Greece, September 1997.

[DSBO00a]
F. Déchelle, N. Schnell, R. Borghesi, and N. Orio. The jMax Environment: An Overview of New Features. In Proc. ICMC, Berlin, 2000.

[DSBO00b]
François Déchelle, Norbert Schnell, Ricardo Borghesi, and Nicolas Orio. The jMax Environment: An Overview of New Features. In Proceedings of the International Computer Music Conference, Berlin, 2000.

[DTC]
Shlomo Dubnov, Naftali Tishby, and Dalia Cohen. Hearing Beyond the Spectrum. Journal of New Music Research, 24(4).

[DuC99]
Bob DuCharme. XML: the annotated specification. The Charles F. Goldfarb series on open information management. Prentice-Hall PTR, Upper Saddle River, NJ 07458, USA, 1999.

[Dut94]
T. Dutoit. High quality text-to-speech synthesis: a comparison of four candidate algorithms. In Proc. ICASSP '94, pages I--565--I--568, Adelaide, Austrailia, April 1994.

[Edw93]
A. L. Edwards. An Introduction to Linear Regression and Correlation. W. H. Freeman and Co, San Francisco, 1993.

[FHC00a]
K. Fitz, L. Haken, and P. Chirstensen. A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling. In Proc. ICMC, Berlin, 2000.

[FHC00b]
K. Fitz, L. Haken, and P. Chirstensen. Transient Preservation under Transformation in an Additive Sound Model. In Proc. ICMC, Berlin, 2000.

[FHC00c]
Kelly Fitz, Lippold Haken, and Paul Chirstensen. A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling. In Proc. ICMC, Berlin, 2000.

[FHC00d]
Kelly Fitz, Lippold Haken, and Paul Chirstensen. Transient Preservation under Transformation in an Additive Sound Model. In Proceedings of the International Computer Music Conference, Berlin, 2000.

[FHH95a]
K. Fitz, L. Haken, and B. Holloway. Lemur -- A Tool for Timbre Manipulation. In Proceedings of the International Computer Music Conference, pages 158--161, Banff, September 1995.

[FHH95b]
K. Fitz, L. Haken, and B. Holloway. Lemur -- A Tool for Timbre Manipulation. In Proc. ICMC, 1995.

[FM97]
Anne Faure and Stephen McAdams. Comparaison de profils sémantiques et de l'espace perceptif de timbres musicaux. In Actes du 4ème Congrès Français d'Acoustique, Marseille, April 1997. Société Française d'Acoustique.

[FRD92a]
A. Freed, X. Rodet, and Ph. Depalle. Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware. In ICSPAT, 1992.

[FRD92b]
Adrian Freed, Xavier Rodet, and Phillipe Depalle. Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware. In ICSPAT, 1992. Available online31.

[FRD93a]
A. Freed, X. Rodet, and Ph. Depalle. Performance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1. In Proceedings of the 19th International Computer Music Conference, Waseda University Center for Scholarly Information, 1993. International Computer Music Association.

[FRD93b]
A. Freed, X. Rodet, and Ph. Depalle. Performance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1. In Proc. ICMC, 1993.

[Fuk90]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 2 edition, 1990.

[Gar94]
Guillermo García. Pm: A library for additive analysis/transformation/synthesis, July 1994. Available online32.

[GBM+96]
R. Gribonval, E. Bacry, S. Mallat, Ph. Depalle, and X. Rodet. Analysis of Sound Signals with High Resolution Matching Pursuit. In Proceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS), 1996. www [AS00].

[GDR+96]
R. Gribonval, Ph. Depalle, X. Rodet, E. Bacry, and S. Mallat. Sound Signal Decomposition using a High Resolution Matching Pursuit. In Proceedings of the International Computer Music Conference (ICMC), August 1996. www [AS00].

[GJM91]
Carlo Ghezzi, Mehdi Jazayeri, and Dino Mandrioli. Fundamentals of Software Engineering. Prentice--Hall, Englewood Cliffs, NJ, 1991.

[GKM96]
Ph. Guillemain and R. Kronland-Martinet. Characterization of acoustic signals through continuous linear time--frequency representations. 84(4):561--585, April 1996.

[GL88]
D.W. Griffin and J.S. Lim. Multiband excitation vocoder. In IEEE Transactions on Acoustics, Speech and Signal Processing, volume 36, pages 1123--1235, 1988.

[GR90]
Thierry Galas and Xavier Rodet. An Improved Cepstral Method for Deconvolution of Source--Filter Systems with Discrete Spectra: Application to Musical Sound Signals. In Proceedings of the International Computer Music Conference (ICMC), Glasgow, September 1990.

[GR91a]
Th. Galas and X. Rodet. Generalized Functional Approximation for Source--Filter System Modeling. In Proc. Eurospeech, 1991.

[GR91b]
Thierry Galas and Xavier Rodet. Generalized Discrete Cepstral Analysis for Deconvolution of Source--Filter Systems with Discrete Spectra. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, October 1991.

[GR91c]
Thierry Galas and Xavier Rodet. Generalized Functional Approximation for Source--Filter System Modeling. In Proc. Eurospeech, pages 1085--1088, Geneve, 1991.

[GS97]
O. Ghitza and M. M. Sondhi. On the perceptual distance between two speech segments. In Journal of the Acoustical Society of America, volume 101, pages 522--529, 1997.

[HAea96]
X. D. Huang, A. Acero, and et al. Whistler: A trainable text-to-speech system. In Proc. of the Int'l Conf. on Spoken Language Processing, pages 2387--2390, 1996.

[Ham77a]
R. W. Hamming. Digital Filters. Signal Processing Series. Prentice--Hall, 1977.

[Ham77b]
Richard Wesley Hamming. Digital Filters. Signal Processing Series. Prentice--Hall, Englewood Cliffs, 1977.

[HB96]
A. J. Hunt and A. W. Black. Unit selection in a concatenative speech synthesis system using a large speech database. In Proc. ICASSP '96, pages 373--376, Atlanta, GA, May 1996. www [CSTR99] Electronic version: cstr/Black_1996_a.s.*.

[HC98]
J. H. L. Hansen and D. T. Chappell. An auditory-based distortion measure with application to concatenative speech synthesis. In IEEE Trans. on Speech and Audio Processing, volume 6, pages 489--495, September 1998.

[Hen98]
Nathalie Henrich. Synthèse de la voix chantée par règles. IRCAM, Paris, France, July 1998. Rapport de stage D.E.A. Acoustique, Traitement de Signal et Informatique Appliqués à la Musique.

[Her98]
Hynek Hermansky. Data-Driven Speech Analysis For ASR. In Petr Sojka, Václav Matousek, Karel Pala, and Ivan Kopecek, editors, Proceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98, pages 213--218, Brno, Czech Republic, September 1998. Masaryk University Press.

[HHW85]
H. Hermansky, B. A. Hanson, and H. Wakita. Perceptually based linear predictive analysis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 509--512, 1985.

[Hir91]
J. Hirschberg. Using Text Analysis to Predict Intonational Boundaries. In Proceedings of Eurospeech, pages 1275--1278, 1991.

[HJ88]
H. Hermansky and J. C. Junqua. Optimization of perceptually-based ASR front-end. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, page 219, 1988.

[HM94]
H. Hermansky and N. Morgan. RASTA processing of speech. In IEEE Transactions on Speech and Acoustics, volume 2, pages 587--589, October 1994.

[Hol83a]
J. N. Holmes. Formant synthesizers: Cascade or Parallel. In Speech Communication, volume 2, pages 251--273, 1983.

[Hol83b]
J. N. Holmes. Formant synthesizers: Cascade or Parallel. In Speech Communication, volume 2, 1983.

[Hub97]
Barbara Burke Hubbard. The World According to Wavelets: The Story of a Mathematical Technique in the Making. A K Peters Ltd, 1997.

[Jac83]
Michael A. Jackson. System development. Prentice--Hall International series in computer science. Prentice--Hall Intern., Englewood Cliffs, 1983.

[Jac95a]
Michael Jackson. Software requirements & specifications : a lexicon of practice, principles, and prejudices. Addison--Wesley, Wokingham, 1995.

[Jac95b]
Ivar Jacobson. Object-Oriented Software Engineering: a Use Case driven Approach. Addison--Wesley, Wokingham, England, 1995.

[KCG96]
O. Karaali, G. Corrigan, and I. Gerson. Speech Synthesis with Neural Networks. In Proc. of World Congress on Neural Networks, pages 45--50, September 1996.

[KM98a]
A. Kain and M. W. Macon. Personalizing a speech synthesizer by voice adaptation. In Proceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop, pages 225--230, November 1998. www [CSLU99].

[KM98b]
A. Kain and M. W. Macon. Text-to-speech voice adaptation from sparse training data. In Proc. of International Conference on Spoken Language Processing, pages 2847--2850, November 1998. www [CSLU99].

[KM98c]
Alexander Kain and Michael W Macon. Spectral voice conversion for text-to-speech synthesis. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), pages 285--288, 1998. www [CSLU99].

[KMS98]
F. Kossentini, M. Macon, and M. Smith. Audio coding using variable-depth multistage quantization. 6, 1998. www [CSLU99].

[Lev98]
Scott N. Levine. Audio Representations for Data Compression and Compressed Domain Processing. Ph.d. dissertation, Department of Electrical Engineering, CCRMA, Stanford University, December 1998. http://www-ccrma.stanford.edu/~scottl/thesis.html.

[Lin98]
Adam Lindsay. MPEG-7 Audio FAQ. WWW page, 1998. moved to [TPMAS98].

[Mac96]
Michael W. Macon. Speech synthesis based on sinusoidal modeling. In PhD thesis. Georgia Institute of Technology, October 1996.

[Mal97]
Stephane Mallat. A Wavelet Tour of Signal Processing. AP Professional, London, 1997.

[MC95]
M. W. Macon and M. A. Clements. Speech synthesis based on an overlap-add sinusoidal model. In J. of the Acoustical Society of America, volume 97, page 3246. Pt. 2, May 1995. www [CSLU99].

[MC96]
Michael W. Macon and Mark A. Clements. Speech Concatenation and Synthesis Using an Overlap--Add Sinusoidal Model. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), volume 1, pages 361--364, Atlanta, USA, 1996. www [CSLU99].

[MC97]
M. W. Macon and M. A. Clements. Sinusoidal modeling and modification of unvoiced speech. In IEEE Transactions on Speech and Audio Processing, volume 5, pages 557--560, November 1997. www [CSLU99].

[MCW98]
M. W. Macon, A. E. Cronk, and J. Wouters. Generalization and discrimination in tree-structured unit selection. In Proceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop, November 1998. www [CSLU99].

[MCWK97]
M. W. Macon, A. E. Cronk, J. Wouters, and A. Kain. Ogireslpc: Diphone synthesizer using residual-excited linear prediction. In Tech. Rep. CSE-97-007. Department of Computer Science, Oregon Graduate Institute of Science and Technology, Portland, OR, September 1997. www [CSLU99].

[MD97a]
F. Malfrere and T. Dutoit. Speech synthesis for text-to-speech alignment and prosodic feature extraction. In Proc. ISCAS 97, pages 2637--2640, Hong-Kong, 1997. www [TCTS99].

[MD97b]
Fabrice Malfrere and Thierry Dutoit. High quality speech synthesis for phonetic speech segmentation. In Proc. Eurospeech '97, pages 2631--2634, Rhodes, Greece, September 1997.

[MDD98]
F. Malfrere, O. Deroo, and T. Dutoit. Phonetic alignement : Speech synthesis based vs. hybrid hmm/ann. In Proc. International Conference on Speech and Language Processing, pages 1571--1574, Sidney, Australia, 1998. www [TCTS99], same content as [DMD98] (with more references).

[Mel97]
Jason Meldrum. The Z--Transform, 1997. Online tutorial33.

[MG80]
J.D. Markel and A.H. Gray. Linear Prediction of Speech. Springer, 1980.

[MJLO+97a]
M. W. Macon, L. Jensen-Link, J. Oliverio, M. Clements, and E. B. George. Concatenation-based midi-to-singing voice synthesis. In 103rd Meeting of the Audio Engineering Society. New York, 1997. www [CSLU99].

[MJLO+97b]
Michael Macon, Leslie Jensen-Link, James Oliverio, Mark A. Clements, and E. Bryan George. A singing voice synthesis system based on sinusoidal modeling. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97), pages 435--438, 1997. www [CSLU99].

[MKC+98]
M. W. Macon, A. Kain, A. E. Cronk, H. Meyer, K. Mueller, B. Saeuberlich, and A. W. Black. Rapid prototyping of a german tts system. In Tech. Rep. CSE-98-015. Department of Computer Science, Oregon Graduate Institute of Science and Technology, Portland, OR, September 1998. www [CSLU99].

[MMLV98]
M. W. Macon, A. McCree, W. M. Lai, and V. Viswanathan. Efficient analysis/synthesis of percussion musical instrument sounds using an all-pole model. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, volume 6, pages 3589--3592. Speech, May 1998. www [CSLU99].

[Moo89]
B. C. J. Moore. An Introduction to the Psychology of Hearing. Academic Press Limited, 3rd edition, 1989.

[MP82]
G. A. Merchant and T. W. Parks. Efficient Solution of a Toeplitz--plus Hankel Coefficient Matrix System of Equations. In IEEE TASSP, volume 30, pages 40--44, February 1982.

[MPEG99]
MPEG-7 ``Multimedia Content Description Interface'' Documentation. WWW page, 1999. http://www.darmstadt.gmd.de/mobile/MPEG7.

[MPH93]
B. Möbius, M. Pätzold, and W. Hess. Analysis and Synthesis of German F0 Contours by Means of Fujisaki's Model. In Speech Communication, volume 13, pages 53--61, 1993.

[MZ92]
S. Mallat and S. Zhong. Characterization of Signals from Multiscale Edges. IEEE Trans. Pattern Anal. Machine Intell., 40(7):2464--2482, July 1992.

[Nag90]
Manfred Nagl. Softwaretechnik: methodisches Programmieren im Großen. Springer compass. Springer, Berlin, 1990.

[Nak94]
S. Nakajima. Automatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering. In Speech Communication, volume 14, page 313, September 1994.

[NGY97]
H. J. Nock, M. J. F. Gales, and Steve Young. A comparative study of methods for phonetic decision-tree state clustering. In Proc. Eurospeech '97, volume 1, pages 111--114, Rhodes, Greece, September 1997.

[NSRK85]
N. Nocerino, F. K. Soong, L. R. Rabiner, and D. H Klatt. Comparative study of several distortion measures for speech recognition. In Speech Communication, volume 4, pages 317--331, 1985.

[OBFW98]
Jörn Ostermann, Mark C. Beutnagel, Ariel Fischer, and Yao Wang. Integration of talking heads and text-to-speech synthesizers for visual tts. In Proc. ICSLP98, 1998. www [ATT99].

[OCM97]
M. Oudot, O. Cappé, and E. Moulines. Robust Estimation of the Spectral Envelope for ``Harmonics+Noise'' Models. In IEEE Workshop on Speech coding, Pocono Manor, September 1997.

[Opp78]
Alan V. Oppenheim, editor. Applications of Digital Signal Processing, chapter Digital Processing of Speech, pages 117--168. Prentice--Hall, 1978.

[OS75]
Alan V. Oppenheim and Ronald W. Schafer. Digital Signal Processing. Prentice--Hall, 1975.

[Osw93]
John Oswald. Plexure. CD, 1993. http://www.interlog.com/~vacuvox/xdiscography.html#plexure.

[Osw99]
John Oswald. Plunderphonics. WWW page, 1999. http://www.6q.com, esp. [Osw93].

[Oud98a]
M. Campedel Oudot. Étude du modèle sinusoïdes et bruit pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale. Thèse, ENST, Paris, 1998.

[Oud98b]
Marine Campedel Oudot. Étude du modèle ``sinusoïdes et bruit'' pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale. Thèse, Ecole Nationale Supérieure des Télécommunications, Paris, France, November 1998.

[Pee98]
G. Peeters. Analyse-Synthèse des sons musicaux par la méthode PSOLA. Agelonde (France), May 1998.

[PR98]
G. Peeters and X. Rodet. Sinusoidal versus Non-Sinusoidal Signal Characterisation. Barcelona, November 1998.

[PR99a]
G. Peeters and X. Rodet. Non-Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum. Orlando, November 1999.

[PR99b]
G. Peeters and X. Rodet. SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum. In Proceedings of the International Computer Music Conference (ICMC), Beijing, October 1999.

[Puc91a]
Miller Puckette. Combining Event and Signal Processing in the MAX Graphical Programming Environment. Computer Music Journal, 15(3):68--77, Winter 1991. Available from34.

[Puc91b]
Miller Puckette. FTS: A Real-Time Monitor for Multiprocessor Music Synthesis. Computer Music Journal, 15(3):58--67, Winter 1991. Available from35.

[PW96]
W. J. Pielemeier and G. H. Wakefield. A High Resolution Time--Frequency Representation for Musical Instrument Signals. J. Acoust. Soc. Am., 99(4):2382--2396, 1996.

[QBC88]
S. R. Quackenbush, T. P. Barnwell, and M. A. Clements. Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs, NJ, 1988.

[RBP+91]
James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, and William Lorensen. Object-Oriented Modeling and Design. Prentice--Hall, Englewood Cliffs, NJ, 1991.

[RD92]
Xavier Rodet and Phillipe Depalle. A new additive synthesis method using inverse Fourier transform and spectral envelopes. In Proceedings of the International Computer Music Conference (ICMC), October 1992.

[RDG95]
Xavier Rodet, Philippe Depalle, and Guillermo García. New Possibilities in Sound Analysis and Synthesis. In ISMA, 1995. Available online36 PostScript37.

[RDP87a]
X. Rodet, Ph. Depalle, and G. Poirot. Speech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions. In European Conf. on Speech Tech., 1987.

[RDP87b]
Xavier Rodet, Phillipe Depalle, and G. Poirot. Speech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions. In European Conference on Speech Tech., September 1987.

[RF96]
Xavier Rodet and Dominique François. XSPECT: Introduction, January 1996. Available online38.

[RFL96]
Xavier Rodet, Dominique François, and Guillaume Levy. Xspect: a New Motif Signal Visualisation, Analysis and Editing Program. In Proceedings of the International Computer Music Conference (ICMC), August 1996. Available online39.

[RH91]
Stuart Rosen and Peter Howell. Signals and Systems for Speech and Hearing. Academic Press, London, 1991.

[RL97a]
X. Rodet and A. Lefèvre. The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use. In Proc. ICMC, Tessaloniki, 1997.

[RL97b]
Xavier Rodet and Adrien Lefèvre. The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use. In Proceedings of the International Computer Music Conference (ICMC), Tessaloniki, Greece, September 1997. Abstract40, PostScript41.

[RL97c]
Xavier Rodet and Adrien Lefèvre. The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use. In Proceedings of the International Computer Music Conference (ICMC), Tessaloniki, Greece, September 1997.

[RM69]
J.C. Risset and M.V. Mathews. Analysis of musical-instrument tones. Physics Today, 22(2):23--30, February 1969.

[Roa96]
Curtis Roads. The Computer Music Tutorial. MIT Press, 1996.

[Rob98]
Tony Robinson. Speech Analysis, 1998. Online tutorial42.

[Roc97]
Thierry Rochebois. Méthodes d'analyse synthèse et représentations optimales des sons musicaux basées sur la réduction de données spectrales. PhD thesis, Université Paris XI, December 1997.

[Rod84a]
X. Rodet. Time-Domain Formant-Wave-Function Synthesis. Computer Music Journal, Fall 1984.

[Rod84b]
Xavier Rodet. Time-Domain Formant-Wave-Function Synthesis. Computer Music Journal, 8(3):9--14, Fall 1984. reprinted from [Sim80].

[Rod97a]
X. Rodet. Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models. In Proc. IEEE Time--Frequency/Time--Scale Workshop, 1997.

[Rod97b]
Xavier Rodet. Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models. In Proceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS), August 1997. Abstract43, PostScript44.

[Rod97c]
Xavier Rodet. The Additive Analysis--Synthesis Package, 1997. Available online45.

[RPB84a]
X. Rodet, Y. Potard, and J.-B. Barrière. The Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General. Computer Music Journal, Fall 1984.

[RPB84b]
Xavier Rodet, Yves Potard, and Jean-Baptiste Barrière. The Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General. Computer Music Journal, 8(3):15--31, Fall 1984.

[RPB85]
Xavier Rodet, Yves Potard, and Jean-Baptiste Barrière. CHANT: de la synthèse de la voix chantée à la synthèse en général. Rapports de recherche IRCAM, 1985. Available online46.

[RSa]
X. Rodet and D. Schwarz. Spectral Envelopes and Additive+Residual Analysis-Synthesis. In J. Beauchamp, ed. The Sound of Music. Springer, N.Y., to be published.

[RSb]
Xavier Rodet and Diemo Schwarz. Spectral Envelopes and Additive+Residual Analysis-Synthesis. In J. Beauchamp, ed. The Sound of Music. Springer, New York, to be published 2000.

[Sag88]
Y. Sagisaka. Speech synthesis by rule using an optimal selection of non-uniform synthesis units. In Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing, page 679, 1988.

[SBHL97a]
X. Serra, J. Bonada, P. Herrera, and R. Loureiro. Integrating Complementary Spectral Models in the Design of a Musical Synthesizer. In Proc. ICMC, 1997.

[SBHL97b]
X. Serra, J. Bonada, P. Herrera, and R. Loureiro. Integrating Complementary Spectral Models in the Design of a Musical Synthesizer. In Proceedings of the International Computer Music Conference, Tessaloniki, 1997.

[SBHL97c]
X. Serra, J. Bonada, P. Herrera, and R. Loureiro. Integrating Complementary Spectral Models in the Design of a Musical Synthesizer. In Proc. ICMC, Tessaloniki, 1997.

[SBHL97d]
Xavier Serra, Jordi Bonada, Perfecto Herrera, and Ramon Loureiro. Integrating complementary spectral models in the design of a musical synthesizer. In Proceedings of the International Computer Music Conference, 1997.

[SCdV+98]
S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, and M. Cohen. Universal Speech Tools: the CSLU Toolkit. In Proc. of International Conference on Spoken Language Processing, November 1998. www [CSLU99].

[Sch98a]
D. Schwarz. Spectral Envelopes in Sound Analysis and Synthesis. Diplomarbeit Nr. 1622, Universität Stuttgart, Fakultät Informatik, Stuttgart, Germany, 1998.

[Sch98b]
D. Schwarz. Spectral Envelopes in Sound Analysis and Synthesis. Diplomarbeit, Universität Stuttgart, Informatik, 1998.

[Sch98c]
Diemo Schwarz. Spectral Envelopes in Sound Analysis and Synthesis. Diplomarbeit Nr. 1622, Universität Stuttgart, Fakultät Informatik, Stuttgart, Germany, June 1998.

[SCS98]
Ann K. Syrdal, Alistair Conkie, and Yannis Stylianou. Exploration of acoustic correlates in speaker selection for concatenative synthesis. In Proc. ICSLP98, 1998. www [ATT99].

[SDS97]
Yannis Stylianou, Thierry Dutoit, and Juergen Schroeter. Diphone concatenation using a harmonic plus noise model of speech. In Proc. Eurospeech '97, pages 613--616, Rhodes, Greece, September 1997. www [TCTS99]Electronic version: tcts/hnmconc.ps.*.

[Sim80]
J. C. Simon, editor. Spoken Language Generation and Understanding. D. Reidel Publishing Company, Dordrecht, Holland, 1980.

[SK92]
Y. Sagisaka and N. Kaiki. Optimization of Intonation Control Using Statistical F0 Resetting Characteristics. In Proceedings of the International Conference on Acoustics, volume 2, pages 49--52. Speech and Signal Processing, 1992.

[SLM95]
Y. Stylianou, J. Laroche, and E. Moulines. High Quality Speech Modification based on a Harmonic+Noise Model. In Proc. EUROSPEECH, 1995.

[SMW97]
Patrick Susini, Stephen McAdams, and Suzanne Winsberg. Caractérisation perceptive des bruits de véhicules. In Actes du 4ème Congrès Français d'Acoustique, Marseille, April 1997. Société Française d'Acoustique.

[Sof97]
Rational Software. Unified modeling language, version 1.1. Online documentation47, September 1997.

[Som85]
Ian Sommerville. Software engineering. International computer science series. Addison--Wesley, Wokingham [u.a.], 2nd edition, 1985.

[SR88]
F.K. Soong and A.E. Rosenberg. On the use of instantaneous and transitional spectral information in speaker recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing, volume 36, pages 871--879, 1988.

[SS90]
X. Serra and J. Smith. Spectral Modeling Synthesis: a Sound Analysis/Synthesis System Based on a Deterministic plus Stochastic Decomposition. Computer Music Journal, 14(4):12--24, 1990.

[SSG+98]
Ann K Syrdal, Yannis G Stylianou, Laurie F Garrison, Alistair Conkie, and Juergen Schroeter. Td-psola versus harmonic plus noise model in diphone based speech synthesis. In Proc. ICASSP98, pages 273--276, 1998. www [ATT99].

[STTI97]
Richard Sproat, Paul Taylor, Michael Tanenblatt, and Amy Isard. A markup language for text-to-speech synthesis. In Proc. Eurospeech '97, pages 1747--1750, Rhodes, Greece, September 1997. www [CSTR99] Electronic version: cstr/Sproat_1997_a.*.

[Sty96]
Y. Stylianou. Decomposition of speech signals into a deterministic and a stochastic part. In Proc. ICSLP '96, volume 2, pages 1213--1216, Philadelphia, PA, October 1996.

[Sty98a]
Yannis Stylianou. Concatenative Speech Synthesis using a Harmonic plus Noise Model. In The 3rd ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Australia, November 1998. www [ATT99].

[Sty98b]
Yannis Stylianou. Removing Phase Mismatches in Concatenative Speech Synthesis. In The 3rd ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Australia, November 1998. www [ATT99].

[SW00]
Diemo Schwarz and Matthew Wright. Extensions and Applications of the SDIF Sound Description Interchange Format. In Proceedings of the International Computer Music Conference, Berlin, August 2000.

[Szy98]
Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. ACM Press and Addison-Wesley, New York, NY, 1998.

[TAW97]
Keith A. Teague, Walter Andrews, and Buddy Walls. Enhanced Modeling of Discrete Spectral Amplitudes. In IEEE Workshop on Speech coding, Pocono Manor, September 1997.

[Tay99]
Paul Taylor. The Festival Speech Architecture. Web page, 1999. www [CSTR99].

[TCTS99]
TCTS (Circuit Theory and Signal Processing) Lab, Faculté Polytechnique de Mons. WWW page, 1999. http://tcts.fpms.ac.be.

[TPMAS98]
D. Thom, H. Purnhagen, and the MPEG Audio Subgroup. MPEG Audio FAQ Version 9. WWW page, October 1998. International Organisation for Standardisation, Organisation Internationale de Normalisation, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, N2431, http://www.tnt.uni-hannover.de/project/mpeg/audio/faq.

[TR]
C. Tuerk and T. Robinson. Speech synthesis using artificial neural networks trained on cepstral coefficients. In Proc. EUROSPEECH, pages 1713--1716.

[Tra92]
C. Traber. F0 Generation with a Database of Natural F0 Patterns and with a Neural Network. In G. Bailly and C. Benot, editors, Talking Machines: Theories, Models, and Designs, pages 287--304. North Holland, 1992.

[UAE93]
Michael Unser, Akram Aldroubi, and Murray Eden. B--Spline Signal Processing: Part I---Theory. In IEEE Transactions on signal processing, volume 41, pages 821--833, 1993.

[Utt93]
Ian A. Utting. Lecture Notes in Object-Oriented Software Engineering. University of Kent at Canterbury, Canterbury, UK, 1993.

[vdVdlLvdV48]
Van van der Van, Dee de la La, and Don von der Von. The longest biliographic reference, 1848.

[vdVOPD+97]
van der Vrecken Olivier, Nicolas Pierret, Thierry Dutoit, Vincent Pagel, and Fabrice Malfrere. A simple and efficient algorithm for the compression of MBROLA segment databases. In Proc. Eurospeech '97, pages 421--424, Rhodes, Greece, September 1997.

[vH13]
Hermann L. von Helmholtz. Die Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik. Vieweg, Braunschweig, 6th edition, 1913.

[vH54]
Hermann L. von Helmholtz. On the Sensations of Tone as a Physiological Basis for the Theory of Music. Dover, New York, 1954. Original title: [vH13].

[vH83]
Hermann L. von Helmholtz. Die Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik. Georg Olms Verlag, Hildesheim, 1983.

[Vir97]
Dominique Virolle. La Librairie CHANT: Manuel d'utilisation des fonctions en C, April 1997. Available online48.

[Vir98]
Dominique Virolle. Sound Description Interchange Format (SDIF), January 1998. Available online49.

[VMT92]
H. Valbret, E. Moulines, and J. P. Tubach. Voice transformation using PSOLA technique. speech, 11(2-3):189--194, June 1992.

[vS94]
R. von Sachs. Peak-insensitive non-parametric spectrum estimation. In Journal of time series analysis, volume 15, pages 429--452. 1994.

[vSHOS96]
J.P.H. van Santen, J. Hirschberg, J. Olive, and R. Sproat, editors. Progress in Speech Synthesis. Springer-Verlag, New York, 1996.

[W+98]
M. Wright et al. New Applications of the Sound Description Interchange Format. In Proc. ICMC, 1998.

[W+99]
M. Wright et al. Audio Applications of the Sound Description Interchange Format Standard. In AES 107th convention, 1999.

[Wak98a]
G. H. Wakefield. Time--Pitch Representations: Acoustic Signal Processing and Auditory Representations. In Proceedings of the IEEE Intl. Symp. on Time--Frequency/Time--Scale, Pittsburgh, 1998.

[Wak98b]
G. H. Wakefield. Time--Pitch Representations: Acoustic Signal Processing and Auditory Representations. In Proc. IEEE Intl. Symp. Time--Frequency/Time--Scale, Pittsburgh, 1998.

[WCF+98]
Matthew Wright, Amar Chaudhary, Adrian Freed, David Wessel, Xavier Rodet, Dominique Virolle, Rolf Woehrmann, and Xavier Serra. New Applications of the Sound Description Interchange Format. In Proceedings of the International Computer Music Conference, 1998.

[WCF+99a]
M. Wright, A. Chaudhary, A. Freed, S. Khoury, and D. Wessel. Audio Applications of the Sound Description Interchange Format Standard. In AES 107th convention, 1999.

[WCF+99b]
Matthew Wright, Amar Chaudhary, Adrian Freed, Sami Khoury, and David Wessel. Audio Applications of the Sound Description Interchange Format Standard. In AES 107th convention preprint, 1999.

[WCF+00a]
M. Wright, A. Chaudhary, A. Freed, S. Khoury, A. Momeni, D. Schwarz, and D. Wessel. An XML-based SDIF Stream Relationships Language. In Proc. ICMC, Berlin, 2000.

[WCF+00b]
Matthew Wright, Amar Chaudhary, Adrian Freed, Sami Khoury, Ali Momeni, Diemo Schwarz, and David Wessel. An XML-based SDIF Stream Relationships Language. In Proceedings of the International Computer Music Conference, Berlin, 2000.

[WCIS93]
W. J. Wang, W. N. Campbell, N. Iwahashi, and Y. Sagisaka. Tree-based unit selection for English speech synthesis. In Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing, pages 191--194, 1993.

[WDK+99a]
M. Wright, R. Dudas, S. Khoury, R. Wang, and D. Zicarelli. Supporting the Sound Description Interchange Format in the Max/MSP Environment. In Proc. ICMC, Beijing, 1999.

[WDK+99b]
Matthew Wright, Richard Dudas, Sami Khoury, Raymond Wang, and David Zicarelli. Supporting the Sound Description Interchange Format in the Max/MSP Environment. In Proceedings of the International Computer Music Conference (ICMC), Beijing, October 1999.

[WM98]
J. Wouters and M. W. Macon. A perceptual evaluation of distance measures for concatenative speech synthesis. In Proc. of International Conference on Spoken Language Processing, November 1998. www [CSLU99].

[WRD92]
Peter Wyngaard, Chris Rogers, and Philippe Depalle. UDI 2.1---A Unified DSP Interface, 1992. Available online50.

[WS99a]
M. Wright and E. Scheirer. Cross-Coding SDIF into MPEG-4 Structured Audio. In Proc. ICMC, Beijing, 1999.

[WS99b]
Matthew Wright and Eric D. Scheirer. Cross-Coding SDIF into MPEG-4 Structured Audio. In Proceedings of the International Computer Music Conference (ICMC), Beijing, October 1999.

[WSR98]
Marcelo M. Wanderley, Norbert Schnell, and Joseph Rovan. ESCHER---Modeling and Performing composed Instruments in real-time. In IEEE Systems, Man, and Cybernetics Conference, October 1998. To be published.

[YH]
Jennifer Yuen and Andrew Horner. Hybrid Sampling-Wavetable Synthesis with Genetic Algorithms. 45(5):316--330.

[YS98]
Ping-Fai Yang and Yannis Stylianou. Real time voice alteration based on linear prediction. In Proc. ICSLP98, 1998. www [ATT99].

[Zwi82]
Eberhard Zwicker. Psychoakustik. Springer, 1982.

Index

  • European Conference on Speech Tech., 11
  • Eyrolles: Informatiques magazine, 6
  • edwards93, 10
  • escher, 11
  • et al., 10

  • Faure, A., 8
  • Fineberg, J., 7
  • Fischer, A., 2
  • Fitz, K., 9, 9, 9, 9, 11, 11
  • François, D., 11
  • François, D., 11
  • Freed, A., 3, 3, 3, 3, 3, 3, 11, 11, 11, 11
  • Friedman, J., 12
  • Fukunaga, K., 12
  • farinelli, 11
  • fft-1, 11
  • fft-2, 11
  • fft-2-short, 11
  • fft-3, 11
  • fft-3-short, 11
  • fof, 11
  • fof-short, 11
  • fof2, 11
  • fts, 11
  • fts-basics, 11
  • fukunaga90, 12

  • Galas, T., 11, 11, 11, 11
  • Gales, M. J. F., 12
  • García, G., 11, 11, 11
  • Garcia, G., 11, 11
  • Garrison, L. F., 2
  • Geneve, 11
  • Georg Olms Verlag, 11
  • George, E. B., 4, 4
  • Georgia Institute of Technology, 13
  • Gerson, I., 10
  • Ghezzi, C., 6
  • Ghitza, O., 10
  • Gilman, A., 13
  • Glasgow, 11
  • Gosselin, B., 13
  • Gray, A., 11
  • Greece, 13
  • Gribonval, R., 11, 11
  • Griffin, D., 10
  • ghitza97, 10
  • grey80, 11
  • griffin88, 10

  • Haken, L., 9, 9, 9, 9, 11, 11
  • Halle, M., 10
  • Hamming, R. W., 11, 11
  • Hanappe, P., 7
  • Hansen, J. H. L., 10
  • Hanson, B. A., 1
  • Harper & Row, 10
  • Henrich, N., 11
  • Hermansky, H., 1, 1, 10, 10
  • Herrera, P., 9, 9, 11, 11
  • Hess, W., 10
  • Higuchi, N., 5
  • Hildesheim, 11
  • Hirschberg, J., 10, 10
  • Holloway, B., 11, 11
  • Holmes, J. N., 11, 11
  • Hong-Kong, 13
  • Honolulu, HI, 5
  • Horner, A., 9, 9, 9, 9
  • Hosom, P., 4
  • Howell, P., 11
  • HRMP, 11
  • HRMP2, 11
  • Huang, X. D., 10
  • Hubbard, B. B., 11
  • Human Communication Research Centre, 5, 5
  • Hunt, A. J., 5
  • hamming77, 11
  • hamming77-short, 11
  • hansen98, 10
  • helmholtz, 11
  • helmholtz-reprint, 11
  • hirschberg91, 10
  • hmm, 11
  • hmm-short, 11
  • holmes83, 11
  • holmes83-short, 11
  • horner96, 9
  • horner98, 9
  • huang96, 10

  • IBspline, 11
  • ICS94, 11
  • ICSPAT, 11, 11
  • IEEE ASSP Workshop on App. of Sig. Proc. to Audio and Acoust., 11
  • IEEE Computer Society Press, 6
  • IEEE Signal Processing Letters, 11
  • IEEE Systems, Man, and Cybernetics Conference, 11
  • IEEE TASSP, 11
  • IEEE Trans., 11, 11
  • IEEE Trans. on Speech and Audio Processing, 10
  • IEEE Trans. Pattern Anal. Machine Intell., 11
  • IEEE Transactions on Acoustics, Speech and Signal Processing, 10, 10
  • IEEE Transactions on Neural Networks, 12
  • IEEE Transactions on Speech and Acoustics, 10
  • IEEE Transactions on Speech and Audio Processing, 4, 4
  • IEEE Transactions on signal processing, 11
  • IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 11
  • IEEE Workshop on Speech coding, 11, 11
  • Important, 13
  • Institut für Maschinelle Sprachverarbeitung, 11
  • International Computer Music Association, 11
  • International computer science series, 11
  • IRCAM, 11
  • ISMA, 11
  • Isard, A., 5
  • Iwahashi, N., 10
  • iau, 11
  • instrument-character, 11
  • ivar, 6

  • J. Acoust. Soc. Am., 9, 9
  • J. of the Acoustical Society of America, 4
  • Jackson, M., 11
  • Jackson, M. A., 11
  • Jacobson, I., 6
  • Jazayeri, M., 6
  • Jenolan Caves, Australia, 2, 2, 2
  • Jensen-Link, L., 4, 4
  • Joint Meeting of ASA, EAA, and DAGA, 2
  • Journal of New Music Research, 12
  • Journal of Phonetics, 10
  • Journal of the Acoustical Society of America, 10
  • Journal of the Audio Engineering Society, 9, 9
  • Journal of time series analysis, 11
  • Junqua, J. C., 10
  • jackson1, 11
  • jackson2, 11
  • jmax2000, 11
  • jmax2000-short, 11
  • jmax99, 11
  • jmax99-short, 11

  • Kaiki, N., 10
  • Kain, A., 4, 4, 4, 4, 4, 4
  • Kaiser, E., 4
  • Karaali, O., 10
  • Karel Pala, 1, 13
  • Khoury, S., 3, 3, 3, 3, 3, 3
  • Klatt, D. H., 10
  • Kluwer Academic Publ., 11
  • Kopecek, I., 1, 13
  • Kossentini, F., 4
  • karaali96, 10

  • Laboissière, R., 10
  • Lai, W. M., 4
  • Laroche, J., 13
  • Laurson, M., 7, 7
  • Lefèvre, A., 11, 11, 11
  • Levine, S. N., 14
  • Levy, G., 11
  • Lim, J., 10
  • Lindsay, A., 14
  • London, 11, 11
  • Lorensen, W., 6
  • Loureiro, R., 9, 9, 11, 11
  • lemur95, 11
  • lemur95-short, 11
  • levine:thesis, 14
  • loris2000a, 9
  • loris2000a-short, 9
  • loris2000b, 9
  • loris2000b-short, 9

  • Möbius, B., 10
  • Macon, M., 4, 4, 4
  • Macon, M. W., 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 13
  • Madrid, Spain, 5
  • Maggi, E., 11, 11
  • Malfrère, F., 13
  • Malfrere, F., 13, 13, 13, 13, 13
  • Malfrere_HighQual_EURO97, 13
  • Mallat, S., 11, 11, 11
  • Mandrioli, D., 6
  • Markel, J., 11
  • Marseille, 8, 8
  • Masaryk University Press, 1, 13
  • Massaro, D., 4
  • Mathews, M., 11
  • Matousek, V., 1, 13
  • McAdams, S., 8, 8, 9
  • McCree, A., 4
  • Meldrum, J., 11
  • Merchant, G. A., 11
  • Mertens, M. B. P., 13
  • Mertens, P., 10
  • Meyer, H., 4
  • MIT Press, 11
  • Mohonk, 11
  • Momeni, A., 3, 3
  • Monterey, CA, 12
  • Moore, B. C. J., 8
  • Morgan, N., 10
  • Moulines, E., 10, 11, 11, 11, 13
  • MPEG Audio Subgroup, the., 14
  • MPEG7:audio-faq, 14
  • MPEG7:www, 14
  • MPEG:audio-faq, 14
  • Mueller, K., 4
  • MultiscaleEdges, 11
  • macon-thesis96, 13
  • mallat, 11
  • marine-thesis, 11
  • marine-thesis-short, 11
  • marine1, 11
  • marine97, 11
  • max, 11
  • moebius93, 10
  • moore89, 8

  • N. Delprat, 11
  • Nagl, M., 11
  • Nakajima, S., 10
  • New Paltz, New York, 11
  • New York, 4, 10, 11, 12
  • New York, NY, 6, 10
  • Nocerino, N., 10
  • Nock, H. J., 12
  • North Holland, 10
  • nagl, 11
  • nakajima94, 10
  • nat, 11
  • newposs, 11
  • nlp:tsdproc213-218, 1
  • nocerino85, 10
  • nock97, 12

  • OASIS, Organization for the Advancement of Structured Information Standards, 6
  • Olive, J., 10
  • Oliverio, J., 4, 4
  • Olivier_SimpAnd_EURO97, 13
  • Olshen, R., 12
  • OM2000, 7
  • OM2000-short, 7
  • OM2000-sshort, 7
  • OM97, 7
  • OM98, 7
  • OM99, 7
  • OM99-short, 7
  • Oppenheim, A. V., 11, 11
  • Orio, N., 11, 11
  • Orlando, 7
  • Ostermann, J., 2
  • Oswald, J., 14, 14
  • Oudot, M., 11, 11
  • Oudot, M. C., 11, 11
  • Oxford, 11
  • omt, 6
  • others, 3, 3, 12

  • P. Guillemain, 11
  • Pätzold, M., 10
  • Pagel, V., 13, 13, 13
  • Paris, 11, 11
  • Paris, France, 6, 11, 11
  • Parks, T. W., 11
  • PEET981, 7
  • PEET983, 7
  • PEET991, 7
  • PEET992, 7
  • Peeters, G., 7, 7, 7, 7
  • Petr Sojka, 1, 13
  • Ph. Guillemain, 11
  • Ph. Tchamitchian, 11
  • Ph.D. Dissertation, 14
  • Phanouriou, C., 6
  • PhD thesis, 10, 13
  • Philadelphia, PA, 13, 16
  • Physics Today, 11
  • Pielemeier, W. J., 9
  • Pierret, N., 13, 13
  • Pittsburgh, 9, 9
  • Pocono Manor, 11, 11
  • Poirot, G., 11, 11
  • Portland, OR, 4, 4
  • Potard, Y., 11, 11, 11
  • Premerlani, W., 6
  • Prentice-Hall PTR, 6
  • Prentice--Hall, 6, 6, 11, 11, 11, 11
  • Prentice--Hall Intern., 11
  • Prentice--Hall International series in computer science, 11
  • Prentice-Hall, 10
  • Proc. AES, 9
  • Proc. EUROSPEECH, 10, 13
  • Proc. EUROSPEECH 97, 16
  • Proc. European Conference on Signal Processing (EUSIPCO'98), 13
  • Proc. Eurospeech, 11, 11
  • Proc. Eurospeech '95, 5
  • Proc. Eurospeech '97, 5, 5, 5, 10, 12, 13, 13, 13
  • Proc. ICASSP '94, 13
  • Proc. ICASSP '96, 5
  • Proc. ICMC, 3, 3, 3, 3, 3, 7, 7, 9, 9, 9, 9, 11, 11, 11, 11, 11, 11
  • Proc. ICSLP '96, 13, 16
  • Proc. IEEE Intl. Symp. Time--Frequency/Time--Scale, 9
  • Proc. IEEE Time--Frequency/Time--Scale Workshop, 11
  • Proc. International Conference on Speech and Language Processing, 13
  • Proc. ISCAS 97, 13
  • Proc. of International Conference on Spoken Language Processing, 4, 4, 4, 4
  • Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing, 10, 10
  • Proc. of the Int'l Conf. on Spoken Language Processing, 10
  • Proc. of World Congress on Neural Networks, 10
  • Proc. ICASSP98, 2
  • Proc. ICSLP98, 2, 2, 2
  • Proceedings of Eurospeech, 10
  • Proceedings of the 19th International Computer Music Conference, 11
  • Proceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop, 4, 4
  • Proceedings of the Audio Engineering Society, 9
  • Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1
  • Proceedings of the IEEE Intl. Symp. on Time--Frequency/Time--Scale, 9
  • Proceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS), 11, 11
  • Proceedings of the International Computer Music Conference, 3, 3, 7, 9, 9, 11, 11, 11, 11
  • Proceedings of the International Computer Music Conference (ICMC), 3, 3, 7, 7, 7, 7, 11, 11, 11, 11, 11, 11, 11, 11, 12
  • Proceedings of the International Conference on Acoustics, 10
  • Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 4, 10
  • Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), 4
  • Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97), 4
  • Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), 4
  • Program for SMPC95, Society for Music Perception and Cognition, 9
  • Pt. 2, 4
  • Puckette, M., 11, 11, 11
  • Purnhagen, H., 14
  • plexure, 14
  • plunderphonics, 14
  • pm, 11
  • prosody-tilt, 11
  • psola92, 10
  • psy:faure97, 8
  • psy:susini97, 8
  • psycho, 11

  • Quackenbush, S. R., 10
  • quackenbush88, 10

  • R. Kronland-Martinet, 11, 11
  • Rabiner, L. R., 10
  • Redwood City, Calif., 6
  • Rhodes, Greece, 5, 5, 5, 10, 12, 13, 13, 13, 16
  • Ridges, 11
  • Ridges2, 11
  • Risset, J., 11
  • Roads, C., 11
  • Robinson, T., 10, 11
  • Rochebois, T., 12
  • Rodet, X., 3, 7, 7, 7, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12
  • Rogers, C., 11
  • Rosen, S., 11
  • Rosenberg, A., 10
  • Rovan, J., 11
  • Rueda, C., 7, 7, 7
  • Ruelle, A., 13
  • Rumbaugh, J., 6
  • Rundle, B., 4
  • roads, 11
  • rochebois97, 12

  • S. Mallat, 11
  • S. Zhong, 11
  • Saeuberlich, B., 4
  • Sagisaka, Y., 10, 10, 10
  • San Francisco, 10
  • Schafer, R. W., 11
  • Schalkwyk, J., 4
  • Scheirer, E., 3
  • Scheirer, E. D., 3
  • Schnell, N., 11, 11, 11, 11, 11
  • School of Music, University of Illinois, 9
  • Schroeter, J., 2, 2, 13
  • Schwartz, J. L., 10
  • Schwarz, D., 3, 3, 7, 11, 11, 11, 11, 11
  • SEAMUS'98, 9
  • Serra, X., 3, 9, 9, 9, 11, 11
  • Shobaki, K., 4
  • Shuster, J. E., 6
  • Sidney, Australia, 13
  • Signal Processing Series, 11, 11
  • Simon, J. C., 11
  • Smith, J., 9
  • Smith, M., 4
  • Société Française d'Acoustique, 8, 8
  • Software, R., 6
  • Sommerville, I., 11
  • Sondhi, M. M., 10
  • Soong, F., 10
  • Soong, F. K., 10
  • Speech, 4
  • Speech and Signal Processing, 10
  • Speech Communication, 10, 10, 10, 13
  • Springer, 11, 11, 11
  • Springer compass, 11
  • Springer-Verlag, 10
  • Sproat, R., 5, 10
  • Statistics/Probability Series, 12
  • Stone, C., 12
  • Stroppa, M., 7, 7, 7
  • Stuttgart, Germany, 11, 11, 11
  • Stylianou, Y., 2, 2, 2, 2, 2, 13, 13, 16
  • Stylianou, Y. G., 2
  • Stylianou_DecoOf_ICSLP96, 16
  • Susini, P., 8
  • Sutton, S., 4
  • Syrdal, A., 2
  • Syrdal, A. K., 2, 2, 2
  • Szyperski, C., 6
  • Szyperski98, 6
  • sagisaka88, 10
  • sagisaka92, 10
  • sdif-ext2000, 7
  • sdif-manual, 11
  • sms90, 9
  • sms97, 9
  • sms97-short, 9
  • softeng, 6
  • sommerville, 11
  • soong88, 10
  • specenv-rod, 11
  • specenv-rod-short, 11
  • speech, 10
  • speechana, 11
  • speechsyn96, 10
  • splinelpc, 11
  • splines, 11
  • stylianou:eurospeech95, 13

  • Talking Machines: Theories, Models, and Designs, 10
  • Tanenblatt, M., 5
  • Taylor, P., 5, 5, 5, 5, 5
  • Teague, K. A., 11
  • Tech. Rep. CSE-97-007, 4
  • Tech. Rep. CSE-98-015, 4
  • Technical Report, 5, 5
  • Tessaloniki, 9, 9, 11
  • Tessaloniki, Greece, 11, 11, 12
  • Thèse, 11, 11
  • The 3rd ESCA/COCOSDA Workshop on Speech Synthesis, 2, 2, 2
  • The Charles F. Goldfarb series on open information management, 6
  • Thessaloniki, Greece, 7
  • Thom, D., 14
  • Tishby, N., 12
  • TO BE FOUND, 5, 5, 6, 6, 10, 10, 10, 12, 12, 12, 12, 13, 13, 13
  • Traber, C., 10
  • Tubach, J. P., 10
  • Tuerk, C., 10
  • tcts:eurosp97, 13
  • tcts:euspico98, 13
  • tcts:icslp98-fmodtd, 13
  • tcts:iscas97, 13
  • tcts:speechcomm96, 13
  • tcts:tsd98, 13
  • tcts:www, 13
  • thelongestandmostharmlessentry, 14
  • toeplitz, 11
  • traber92, 10
  • tuerk93, 10

  • Univ. Calif. Berkeley, 9
  • Universität Stuttgart, Fakultät Informatik, 11, 11
  • Universität Stuttgart, Informatik, 11
  • Université Paris XI, 12
  • University of Kent at Canterbury, 11
  • Unser, M., 11
  • Upper Saddle River, NJ 07458, USA, 6
  • Urbana, IL, 9
  • Utting, I. A., 11
  • udi, 11
  • uml-www, 6

  • Valbret, H., 10
  • Vermeulen, P., 4
  • Vieweg, 11
  • Virolle, D., 3, 11, 11
  • Viswanathan, V., 4
  • van Santen, J., 10
  • van der Van, V., 14
  • van der Vrecken Olivier, 13
  • von Helmholtz, H. L., 11, 11, 11
  • von Sachs, R., 11
  • von der Von, D., 14

  • W. H. Freeman and Co, 10
  • Wadsworth and Brooks, 12
  • Wadsworth Publishing Company, 12
  • Wakefield, G. H., 9, 9, 9
  • Wakita, H., 1
  • Walls, B., 11
  • Wanderley, M. M., 11
  • Wang, R., 3, 3
  • Wang, W. J., 10
  • Wang, Y., 2
  • Waseda University Center for Scholarly Information, 11
  • Washington, 6
  • Web Page, 5
  • Wessel, D., 3, 3, 3, 3, 3
  • Williams, S. M., 6
  • Winsberg, S., 8
  • Woehrmann, R., 3
  • Wokingham, 11
  • Wokingham [u.a.], 11
  • Wokingham, England, 6
  • Wouters, J., 4, 4, 4, 4
  • Wright, M., 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 7
  • Wyngaard, P., 11
  • wakefield96, 9
  • wakefield98, 9
  • wakefield98-short, 9
  • wang93, 10
  • wavelets, 11

  • XML, 6
  • xspect, 11
  • xspect-manual, 11

  • Yallop, C., 11
  • Yan, Y., 4
  • Yang, P., 2
  • Yoshiharu, I., 5
  • Young, S., 12
  • Yuen, J., 9

  • Zicarelli, D., 3, 3, 11
  • Zwicker, E., 11
  • z, 11


1
http://#1
2
http://#1
3
http://#1
4
http://#1
5
http://#1
6
http://#1
7
http://#1
8
http://#1
9
http://#1
10
http://#1
11
http://#1
12
http://#1
13
http://#1
14
http://#1
15
http://#1
16
http://#1
17
http://#1
18
http://#1
19
http://#1
20
http://#1
21
http://#1
22
http://#1
23
http://#1
24
http://#1
25
http://#1
26
http://#1
27
http://#1
28
http://#1
29
http://#1
30
http://#1
31
http://#1
32
http://#1
33
http://#1
34
http://#1
35
http://#1
36
http://#1
37
http://#1
38
http://#1
39
http://#1
40
http://#1
41
http://#1
42
http://#1
43
http://#1
44
http://#1
45
http://#1
46
http://#1
47
http://#1
48
http://#1
49
http://#1
50
http://#1

This document was translated from LATEX by HEVEA.