MISC | asp:www [ASP99] |
Key | ASP |
Title | Anthropic Signal Processing Group, Oregon Graduate Institute of Science and Technology |
Howpublished | WWW page |
Year | 1999 |
url | http://ece.ogi.edu/asp |
pub-url | http://ece.ogi.edu/asp/publicat.html |
Note | http://ece.ogi.edu/asp |
INPROC. | asp:plp85 [HHW85] |
Author | |
Title | Perceptually based linear predictive analysis of speech |
Booktitle | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing |
Year | 1985 |
Pages | 509--512 |
INPROC. | nlp:tsdproc213-218 [Her98] |
Title | Data-Driven Speech Analysis For ASR |
Pages | 213--218 |
Author | |
Booktitle | Proceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98 |
Year | 1998 |
Editor | |
Address | Brno, Czech Republic |
Month | September |
Publisher | Masaryk University Press |
MISC | att:www [ATT99] |
Key | ATT |
Title | AT&T Labs, Oregon Graduate Institute of Science and Technology |
Howpublished | WWW page |
Year | 1999 |
url | http://www.research.att.com/projects/tts/ |
Note | http://www.research.att.com/projects/tts/ |
INPROC. | att:nextgen99 [BCS+99] |
Author | |
Title | The AT&T Next-Gen TTS System |
Booktitle | Joint Meeting of ASA, EAA, and DAGA |
Address | Berlin, Germany |
Month | March |
Year | 1999 |
Note | www [ATT99] |
Abstract | The new AT&T Text-To-Speech (TTS) system for general U.S. English text is based on best-choice components of the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR's CHATR system. From Flextalk, it employs text normalization, letter-to-sound, and prosody generation. Festival provides a flexible and modular architecture for easy experimentation and competitive evaluation of different algorithms or modules. In addition, we adopted CHATR's unit selection algorithms and modified them in an attempt to guarantee high intelligibility under all circumstances. Finally, we have added our own Harmonic plus Noise Model (HNM) backend for synthesizing the output speech. Most decisions made during the research and development phase of this system were based on formal subjective evaluations. We feel that the new system goes a long way toward delivering on the long-standing promise of truly natural-sounding, as well as highly intelligible, synthesis. |
INPROC. | att:diph-select98 [BCS98] |
Author | |
Title | Diphone Synthesis using Unit Selection |
Booktitle | The 3rd ESCA/COCOSDA Workshop on Speech Synthesis |
Address | Jenolan Caves, Australia |
Month | November |
Year | 1998 |
Note | www [ATT99] |
Remarks | Summary: CHATR unit selection (using phone units) extended to diphones. Open synthesis backend: PSOLA, HNM, wave concat. Uses standard Festival. Careful listening test examining influence on quality of synthesis/unit type/pruning. Base for Next-Gen TTS [BCS+99]? |
Abstract | This paper describes an experimental AT&T concatenative synthesis system using unit selection, for which the basic synthesis units are diphones. The synthesizer may use any of the data from a large database of utterances. Since there are in general multiple instances of each concatenative unit, the system performs dynamic unit selection. Selection among candidates is done dynamically at synthesis, in a manner that is based on and extends unit selection implemented in the CHATR synthesis system [1][4]. Selected units may be either phones or diphones, and they can be synthesized by a variety of methods, including PSOLA [5], HNM [11], and simple unit concatenation. The AT&T system, with CHATR unit selection, was implemented within the framework of the Festival Speech Synthesis System [2]. The voice database amounted to approximately one and one-half hours of speech and was constructed from read text taken from three sources. The first source was a portion of the 1989 Wall Street Journal material from the Penn Treebank Project, so that the most frequent diphones were well represented. Complete diphone converage was assured by the second text, which was designed for diphone databases [12]. A third set of data consisted of recorded prompts for telephone service applications. Subjective formal listening tests were conducted to compare speech quality for several options that exist in the AT&T synthesizer, including synthesis methods and choices of fundamental units. These tests showed that unit selection techniques can be successfully applied to diphone synthesis. |
INPROC. | att:HNM98 [Sty98a] |
Author | |
Title | Concatenative Speech Synthesis using a Harmonic plus Noise Model |
Booktitle | The 3rd ESCA/COCOSDA Workshop on Speech Synthesis |
Address | Jenolan Caves, Australia |
Month | November |
Year | 1998 |
Note | www [ATT99] |
Abstract | This paper describes the application of the Harmonic plus Noise Model, HNM, for concatenative Text-to-Speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of speech signal in these two components allows for more natural-sounding modifications (e.g., source and filter modifications) of the signal. The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness and pleasantness. |
INPROC. | att:ph98 [Sty98b] |
Author | |
Title | Removing Phase Mismatches in Concatenative Speech Synthesis |
Booktitle | The 3rd ESCA/COCOSDA Workshop on Speech Synthesis |
Address | Jenolan Caves, Australia |
Month | November |
Year | 1998 |
Note | www [ATT99] |
Abstract | Concatenation of acoustic units is widely used in most of the currently available text-to-speech systems. While this approach leads to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded in a different order. One important issue in concatenation is that of synchronization of speech frames or, in other words, inter-frame coherence. This paper presents a novel method for synchronization of signals with applications to speech synthesis. The method is based on the notion of center of gravity applied to speech signals. It is an off-line approach as this can be done during analysis with no computational burden on synthesis. The method has been tested with the Harmonic plus Noise Model, HNM, on many large speech databases. The resulting synthetic speech is free of phase mismatch (inter-frame incoherence) problems. |
INPROC. | att:Yang98 [YS98] |
Author | |
Title | Real Time Voice Alteration Based on Linear Prediction |
Year | 1998 |
Booktitle | Proc. ICSLP98 |
Note | www [ATT99] |
INPROC. | att:Syrdal98 [SCS98] |
Author | |
Title | Exploration of Acoustic Correlates in Speaker Selection for Concatenative Synthesis |
Year | 1998 |
Booktitle | Proc. ICSLP98 |
Note | www [ATT99] |
INPROC. | att:Ostermann98 [OBFW98] |
Author | |
Title | Integration Of Talking Heads And Text-To-Speech Synthesizers For Visual TTS |
Year | 1998 |
Booktitle | Proc. ICSLP98 |
Note | www [ATT99] |
INPROC. | att:paperSYN98 [SSG+98] |
Author | |
Title | TD-PSOLA versus Harmonic Plus Noise Model in Diphone Based Speech Synthesis |
Year | 1998 |
Booktitle | Proc. ICASSP98 |
Pages | 273--276 |
Note | www [ATT99] |
Abstract | In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the Harmonic plus Noise Model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA. |
INPROC. | cnmat:sdif98 [WCF+98] |
Author | |
Title | New Applications of the Sound Description Interchange Format |
Booktitle | Proceedings of the International Computer Music Conference |
Year | 1998 |
INPROC. | cnmat:sdif98-short [W+98] |
Author | |
Title | New Applications of the Sound Description Interchange Format |
Booktitle | Proc. ICMC |
Year | 1998 |
INPROC. | cnmat:sdif99 [WCF+99b] |
Author | |
Title | Audio Applications of the Sound Description Interchange Format Standard |
Booktitle | AES 107th convention preprint |
Year | 1999 |
INPROC. | cnmat:sdif99-short [WCF+99a] |
Author | |
Title | Audio Applications of the Sound Description Interchange Format Standard |
Booktitle | AES 107th convention |
Year | 1999 |
INPROC. | cnmat:sdif99-sshort [W+99] |
Author | |
Title | Audio Applications of the Sound Description Interchange Format Standard |
Booktitle | AES 107th convention |
Year | 1999 |
INPROC. | cnmat:sdif-mpeg4 [WS99b] |
Author | |
Title | Cross-Coding SDIF into MPEG-4 Structured Audio |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 1999 |
Address | Beijing |
Month | October |
url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/saol+sdif/icmc99-saol+sdif.html |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+mpeg4.html |
bib-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999 |
Abstract | With the completion of the MPEG-4 international standard in October 1998, considerable industry and academic resources will be devoted to building implementations of the MPEG-4 Structured Audio tools. Among these tools is the Structured Audio Orchestra Language (``SAOL''), a general-purpose sound processing and synthesis language. The standardization of MPEG-4 and SAOL is an important development for the computer music community, because compositions written in SAOL will be able to be synthesized by any compliant MPEG-4 decoder. At the same time, the sound analysis and synthesis community has developed and embraced the Sound Description Interface Format (``SDIF''), a general-purpose framework for representing various high-level sound descriptions such as sum-of-sinusoids, noise bands, time-domain samples, and formants. Many tools for composing and manipulating sound in the SDIF format have been created. Composers, sound designers, and analysis/synthesis researchers can benefit from the combined strengths of MPEG-4 and SDIF by using the MPEG-4 Structured Audio decoder as an SDIF synthesizer. This allows the use of sophisticated SDIF tools to create musical works, while leveraging the anticipated wide penetration of MPEG-4 playback devices. Cross-coding SDIF into the Structured Audio format is an example of ``Generalized Audio Coding,'' a new paradigm in which an MPEG-4 Structured Audio decoder is used to flexibly understand and play sound stored in any format. We cross-code SDIF into Structured Audio by writing a SAOL instrument for each type of SDIF sound representation and a translator that maps SDIF data into a Structured Audio score. Rather than use many notes to represent the frames of SDIF data, we use the ``streaming wavetable'' functions of SAOL to create instruments that dynamically interpret spectral, sinusoidal, or other constantly changing data. These SAOL instruments retrieve SDIF data from streaming wavetables via custom unit generators that can be reused to build SAOL synthesizers for other SDIF sound representations. We demonstrate the construction of several different SDIF object types within the Structured Audio framework; the resulting bitstreams are very compact and follow the MPEG-4 specification exactly. Any conforming MPEG-4 decoder can play them back and produce the sound desired by the composer. Our paper will discuss in depth the features of SAOL that make these sorts of instruments possible. By building a link between the MPEG-4 community and the SDIF community, our work contributes to both: The MPEG-4 community benefits by receiving support for synthesis from a large and extensible collection of sound descriptions, each with unique properties of data compression and mutability. The SDIF community gets a stable SDIF synthesis platform that is likely to be supported on a variety of inexpensive, high performance hardware platforms. MPEG-4 also provides the potential to integrate SDIF with other formats, e.g., streaming SDIF data synchronized with video and compressed speech. Finally, each standardization effort benefits from an expanded user base: SDIF users become MPEG-4 users without giving up their familiar tools, while MPEG-4 users outside the small community of sound analysis/synthesis researchers can discover SDIF and the high-level sound descriptions it supports. We have made the cross-coding tools and SDIF object instruments freely available to the computer music community in order to promote the continuing interoperability of these important specifications. |
INPROC. | cnmat:sdif-mpeg4-short [WS99a] |
Author | |
Title | Cross-Coding SDIF into MPEG-4 Structured Audio |
Booktitle | Proc. ICMC |
Year | 1999 |
Address | Beijing |
url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/saol+sdif/icmc99-saol+sdif.html |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+mpeg4.html |
bib-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999 |
INPROC. | cnmat:sdif-msp [WDK+99b] |
Author | |
Title | Supporting the Sound Description Interchange Format in the Max/MSP Environment |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 1999 |
Address | Beijing |
Month | October |
url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/msp+sdif/ICMC99-MSP+SDIF-short.html |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+msp.html |
bib-url | http://www.ircam.fr/equipes/repmus/RMPapers/ |
Abstract | The Sound Description Interchange Format (``SDIF'') is an extensible, general-purpose framework for representing high-level sound descriptions such as sum-of-sinusoids, noise bands, time-domain samples, and formants, and is used in many interesting sound analysis and synthesis applications. SDIF data consists of time-tagged ``frames,'' each containing one or more 2D ``matrices''. For example, in an SDIF file representing additive synthesis data, the matrix rows represent individual sinusoids and the columns represent parameters such as frequency, amplitude, and phase. Because of Max/MSP's many attractive features for developing real-time computer music applications, it makes a fine environment for developing applications that manipulate SDIF data. These features include active support and development, a large library of primitive computational objects, and a rich history and repertoire. Unfortunately, Max/MSP's limited language of data structures does not support the structure required by SDIF. Although it is straightforward to extend Max/MSP with an object to read SDIF, there is no Max/MSP data type that could be used to output SDIF data to the rest of a Max/MSP application. We circumvent these problems with a novel technique to manipulate SDIF data within Max/MSP. We have created an object called ``SDIF-buffer'' that represents a collection of SDIF data in memory, analogous to MSP's ``buffer '' object that represents audio samples in memory. This allows SDIF data to be represented with C data structures. Max/MSP has objects that provide various control structures to read data from a ``buffer '' and output signals or events usable by other Max/MSP objects. Similarly, we have created a variety of ``SDIF selector'' objects that select a piece of SDIF data from an SDIF-buffer and shoehorn it into a standard Max/MSP data type. The simplest SDIF selector outputs the main matrix from the SDIF frame whose time tag is closest to a given input time. Arguments specify which columns should be output and whether each row should appear as an individual list or all the rows should be concatenated into a single list. More sophisticated SDIF selectors hide the discrete time sampling of SDIF frames, using interpolation along the time axis to synthesize SDIF data. This provides the abstraction of continuous time, with a virtual SDIF frame corresponding to any point along the time axis. We provide linear and a variety of polynomial interpolators. This abstraction of continuously-sampled SDIF data gives rise to sophisticated ways of moving through the time axis of an SDIF-buffer. We introduce the notion of a ``time machine'', a control structure for controlling position in an SDIF time axis in real time, and demonstrate time machines with musically useful features. ``SDIF mutator'' objects have been created that can manipulate data in an SDIF-buffer in response to Max messages. This allows us to write real-time sound analysis software to generate an SDIF model of an audio signal. We implement control structures such as transposition, filtering, and inharmonicity as normal Max/MSP patches that mutate a ``working'' SDIF-buffer; these are cascaded when they share the same SDIF-buffer. These control structures communicate via symbolic references to SDIF-buffers represented as normal Max messages. This system also supports network streaming of SDIF data. As research continues towards more efficient and musically interesting streaming protocols, Max/MSP interfaces will be implemented in C as SDIF mutators that access a given SDIF buffer via a struct definition in the exposed SDIF-buffer header file. One promising approach is to begin transmission with a low-resolution representation and then fill it in with increasing detail. Time machines communicate with streaming interfaces via Max messages to request or predict ranges of time that will need to be available in the near future. |
INPROC. | cnmat:sdif-msp-short [WDK+99a] |
Author | |
Title | Supporting the Sound Description Interchange Format in the Max/MSP Environment |
Booktitle | Proc. ICMC |
Year | 1999 |
Address | Beijing |
url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/papers/msp+sdif/ICMC99-MSP+SDIF-short.html |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999/abstracts/sdif+msp.html |
bib-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC1999 |
INPROC. | cnmat:sdif-srl [WCF+00b] |
Author | |
Title | An XML-based SDIF Stream Relationships Language |
Booktitle | Proceedings of the International Computer Music Conference |
Year | 2000 |
Address | Berlin |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC2000/abstracts/xml-sdif |
bib-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC2000/ |
INPROC. | cnmat:sdif-srl-short [WCF+00a] |
Author | |
Title | An XML-based SDIF Stream Relationships Language |
Booktitle | Proc. ICMC |
Year | 2000 |
Address | Berlin |
abstract-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC2000/abstracts/xml-sdif |
bib-url | http://cnmat.CNMAT.Berkeley.EDU/ICMC2000/ |
INPROC. | cnmat:osw2000-short [CFW00] |
Author | |
Title | An Open Architecture for Real-time Music Software |
Booktitle | Proc. ICMC |
Year | 2000 |
Address | Berlin |
MISC | cslu:www [CSLU99] |
Key | CSLU |
Title | CSLU Speech Synthesis Research Group, Oregon Graduate Institute of Science and Technology |
Howpublished | WWW page |
Year | 1999 |
url | http://cslu.cse.ogi.edu/tts |
pub-url | http://cslu.cse.ogi.edu/tts/publications |
Note | http://cslu.cse.ogi.edu/tts |
ARTICLE | cslu:ieeetsap98 [KMS98] |
Author | |
Title | Audio coding using variable-depth multistage quantization |
Booktitle | IEEE Transactions on Speech and Audio Processing |
Volume | 6 |
Year | 1998 |
Note | www [CSLU99] |
INPROC. | cslu:esca98mm [MCW98] |
Author | |
Title | Generalization and Discrimination in tree-structured unit selection |
Booktitle | Proceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop |
Month | November |
Year | 1998 |
Note | www [CSLU99] |
Remarks | Great overview of several unit selection methods, comprehensive biliography: origin of unit selection? [Sag88]. festival unit selection [HB96, BC95]. classification and regression trees [BFOS84a]. clustering and decision trees [BT97b, WCIS93, Nak94]. Mahalanobis distance [Don96]. decision trees for: speech recognition [NGY97], speech synthesis [HAea96]. data driven direct mapping with ANN [KCG96, TR]. distance measures for: coding [QBC88], ASR [NSRK85, HJ88], in general [GS97], concatenative speech synthesis [HC98, WM98]. PLP: [HM94]. Linear regression and correlation, Fisher transform: [Edw93]. Tree pruning: [CM98]. Masking effects: [Moo89]. |
Abstract | Concatenative ``selection-based'' synthesis from large databases has emerged as a viable framework for TTS waveform generation. Unit selection algorithms attempt to predict the appropriateness of a particular database speech segment using only linguistic features output by text analysis and prosody prediction components of a synthesizer. All of these algorithms have in common a training or ``learning'' phase in which parameters are trained to select appropriate waveform segments for a given feature vector input. One approach to this step is to partition available data into clusters that can be indexed by linguistic features available at runtime. This method relies critically on two important principles: discrimination of fine phonetic details using a perceptually-motivated distance measure in training and generalization to unseen cases in selection. In this paper, we describe efforts to systematically investigate and improve these parts of the process. |
INPROC. | cslu:esca98kain [KM98a] |
Author | |
Title | Personalizing a speech synthesizer by voice adaptation |
Booktitle | Proceedings of the 3rd ESCA/COCOSDA International Speech Synthesis Workshop |
Month | November |
Year | 1998 |
Pages | 225--230 |
Note | www [CSLU99] |
Abstract | A voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian Mixture Model. Numerical and perceptual evaluations reveal insights into the correlation between adaptation quality and the amount of training data, the number of free parameters. A new joint density estimation algorithm is compared to a previous approach. Numerical errors are studied on the basis of broad phonetic categories. A data augmentation method for training data with incomplete phonetic coverage is investigated and found to maintain high speech quality while partially adapting to the target voice. |
INPROC. | cslu:icslp98cronk [CM98] |
Author | |
Title | Optimized Stopping Criteria for Tree-Based Unit Selection in Concatenative Synthesis |
Oldtitle | Optimization of stopping criteria for tree-structured unit selection |
Booktitle | Proc. of International Conference on Spoken Language Processing |
Volume | 5 |
Month | November |
Year | 1998 |
Pages | 1951--1955 |
Note | www [CSLU99] |
Remarks | Summary: Method for growing optimal clustering tree (CART, as in [BFOS84a]). Not stopping with thresholds, but growing three completely (until no splittable clusters are left), and then pruning by recombining clusters by a greedy algorithm. Gives evaluation measure V-fold cross validation for tree quality. Clusters represent units with equivalent target cost. A best split of a cluster maximizes the decrease in data impurity (lower within-cluster variance of acoustic features). N.B.: Clustering of units is not classification, as the classes are not known in advance, and the method is unsupervised! Weighting in distortion measure using Mahalanobis distance as the inverse of the variance. References: [BC95], [BT97b], [BFOS84a], [Don96], [Fuk90] (CART tree evaluation criterion), [NGY97], [Nak94], [WCIS93]. |
INPROC. | cslu:icslp98kain [KM98b] |
Author | |
Title | Text-to-speech voice adaptation from sparse training data |
Booktitle | Proc. of International Conference on Spoken Language Processing |
Month | November |
Year | 1998 |
Pages | 2847--2850 |
Note | www [CSLU99] |
INPROC. | cslu:icslp98-paper [WM98] |
Author | |
Title | A Perceptual Evaluation of Distance Measures for Concatenative Speech Synthesis |
Booktitle | Proc. of International Conference on Spoken Language Processing |
Month | November |
Year | 1998 |
Note | www [CSLU99] |
Abstract | In concatenative synthesis, new utterances are created by concatenating segments (units) of recorded speech. When the segments are extracted from a large speech corpus, a key issue is to select segments that will sound natural in a given phonetic context. Distance measures are often used for this task. However, little is known about the perceptual relevance of these measures. More insightinto the relationship between computed distances and perceptual differences is needed to develop accurate unit selection algorithms, and to improve the quality of the resulting computer speech. In this paper, we develop a perceptual test to measure subtle phonetic differences between speech units. We use the perceptual data to evaluate several popular distance measures. The results show that distance measures that use frequency warping perform better than those that do not, and minimal extra advantage is gained by using weighted distances or delta features. |
INPROC. | cslu:cslutoolkit [SCdV+98] |
Author | |
Title | Universal Speech Tools: the CSLU Toolkit |
Booktitle | Proc. of International Conference on Spoken Language Processing |
Month | November |
Year | 1998 |
Note | www [CSLU99] |
INCOLL. | cslu:german98 [MKC+98] |
Author | |
Title | Rapid Prototyping of a German TTS System |
Booktitle | Tech. Rep. CSE-98-015 |
Publisher | Department of Computer Science, Oregon Graduate Institute of Science and Technology |
Address | Portland, OR |
Month | September |
Year | 1998 |
Note | www [CSLU99] |
INPROC. | cslu:icassp98mm [MMLV98] |
Author | |
Title | Efficient Analysis/Synthesis of Percussion Musical Instrument Sounds Using an All-Pole Model |
Booktitle | Proceedings of the International Conference on Acoustics, Speech, and Signal Processing |
Volume | 6 |
Publisher | Speech |
Month | May |
Year | 1998 |
Pages | 3589--3592 |
Note | www [CSLU99] |
Abstract | It is well-known that an impulse-excited, all-pole filter is capable of representing many physical phenomena, including the oscillatory modes of percussion musical instruments like woodblocks, xylophones, or chimes. In contrast to the more common application of all-pole models to speech, however, practical problems arise in music synthesis due to the location of poles very close to the unit circle. The objective of this work was to develop algorithms to find excitation and filter parameters for synthesis of percussion instrument sounds using only an inexpensive all-pole filter chip (TI TSP50C1x). The paper describes analysis methods for dealing with pole locations near the unit circle, as well as a general method for modeling the transient attackcharacteristics of a particular sound while independently controlling the amplitudes of each oscillatory mode. |
INPROC. | cslu:icassp98kain [KM98c] |
Author | |
Title | Spectral Voice Conversion for Text-to-Speech Synthesis |
Year | 1998 |
Booktitle | Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98) |
Pages | 285--288 |
Note | www [CSLU99] |
Abstract | A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speaker's average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data. However, speech quality improved with an increase in training set size. |
INCOLL. | cslu:ogireslpc97 [MCWK97] |
Author | |
Title | OGIresLPC: Diphone synthesizer using residual-excited linear prediction |
Booktitle | Tech. Rep. CSE-97-007 |
Publisher | Department of Computer Science, Oregon Graduate Institute of Science and Technology |
Month | September |
Year | 1997 |
Address | Portland, OR |
Note | www [CSLU99] |
INPROC. | cslu:aes97 [MJLO+97a] |
Author | |
Title | Concatenation-based MIDI-to-singing voice synthesis |
Booktitle | 103rd Meeting of the Audio Engineering Society |
Publisher | New York |
Year | 1997 |
Note | www [CSLU99] |
Abstract | In this paper, we propose a system for synthesizing the human singing voice and the musical subtleties that accompany it. The system, Lyricos, employs a concatenation-based text-to-speech method to synthesize arbitrary lyrics in a given language. Using information contained in a regular MIDI file, the system chooses units, represented as sinusoidal waveform model parameters, from an inventory of data collected from a professional singer, and concatenates these to form arbitrary lyrical phrases. Standard MIDI messages control parameters for the addition of vibrato, spectral tilt, and dynamic musical expression, resulting in a very natural-sounding singing voice. |
INPROC. | cslu:trsap97 [MC97] |
Author | |
Title | Sinusoidal modeling and modification of unvoiced speech |
Booktitle | IEEE Transactions on Speech and Audio Processing |
Volume | 5 |
Month | November |
Year | 1997 |
Pages | 557--560 |
Number | 6 |
Note | www [CSLU99] |
Abstract | Although sinusoidal models have been shown to be useful for time-scale and pitch modification of voiced speech, objectionable artifacts often arise when such models are applied to unvoiced speech. This correspondence presents a sinusoidal model-based speech modification algorithm that preserves the natural character of unvoiced speech sounds after pitch and time-scale modification, eliminating commonly-encountered artifacts. This advance is accomplished via a perceptually-motivated modulation of the sinusoidal component phases that mitigates artifacts in the reconstructed signal after time-scale and pitch modification |
INPROC. | cslu:icassp97 [MJLO+97b] |
Author | |
Title | A Singing Voice Synthesis System Based on Sinusoidal Modeling |
Year | 1997 |
Booktitle | Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97) |
Pages | 435--438 |
Note | www [CSLU99] |
Abstract | Although sinusoidal models have been demonstrated to be capable of high-quality musical instrument synthesis, speech modification, and speech synthesis, little exploration of the application of these models to the synthesis of singing voice has been undertaken. In this paper, we propose a system framework similar to that employed in concatenation-based text-to-speech synthesizers, and describe its extension to the synthesis of singing voice. The power and flexibility of the sinusoidal model used in the waveform synthesis portion of the system enables high-quality, computationally-effcient synthesis and the incorporation of musical qualities such as vibrato and spectral tilt variation. Modeling of segmental phonetic characteristics is achieved by employing a``unit selection'' procedure that selects sinusoidally-modeled segments from an inventory of singing voice data collected from ahuman vocalist. The system, called Lyricos, is capable of synthesizing very natural-sounding singing that maintains the characteristics and perceived identityof the analyzed vocalist. |
INPROC. | cslu:icassp96 [MC96] |
Address | Atlanta, USA |
Author | |
Booktitle | Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96) |
Title | Speech Concatenation and Synthesis Using an Overlap--Add Sinusoidal Model |
Year | 1996 |
Volume | 1 |
Pages | 361--364 |
Note | www [CSLU99] |
Abstract | In this paper, an algorithm for the concatenation of speech signal segments taken from disjoint utterances is presented. The algorithm is based on the Analysis-by-Synthesis/Overlap-Add (ABS/OLA) sinusoidal model, which is capable of performing high quality pitch- and time-scale modification of both speech and music signals. With the incorporation of concatenation and smoothing techniques, the model is capable of smoothing the transitions between separately-analyzed speech segments by matching the time- and frequency-domain characteristics of the signals at their boundaries. The application of these techniques in a text-to-speech system based on concatenation of diphone sinusoidal models is also presented. |
INPROC. | cslu:jasa95 [MC95] |
Author | |
Title | Speech synthesis based on an overlap-add sinusoidal model |
Booktitle | J. of the Acoustical Society of America |
Volume | 97 |
Publisher | Pt. 2 |
Month | May |
Year | 1995 |
Pages | 3246 |
Number | 5 |
Note | www [CSLU99] |
MISC | cstr:www [CSTR99] |
Key | CSTR |
Title | Centre for Speech Technology Research, University of Edinburgh |
Howpublished | WWW page |
Year | 1999 |
url | http://www.cstr.ed.ac.uk/ |
pub-url | http://www.cstr.ed.ac.uk/projects/festival/papers.html |
Note | http://www.cstr.ed.ac.uk/ |
INPROC. | cstr:unitsel96 [HB96] |
Author | |
Title | Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database |
Booktitle | Proc. ICASSP '96 |
Address | Atlanta, GA |
Month | May |
Year | 1996 |
Pages | 373--376 |
Note | www [CSTR99] Electronic version: cstr/Black1996a.s.* |
Remarks | cited in [MCW98] |
Abstract | One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two meth ods for training from speech are presented which provide weights which produce more natural speech than can be obtained by handtuning. |
INPROC. | cstr:unitsel97 [BT97b] |
Author | |
Title | Automatically Clustering Similar Units for Unit Selection in Speech Synthesis |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 601--604 |
Note | www [CSTR99] Electronic version: cstr/Black1997b.* |
Remarks | cited in [MCW98]: clustering and decision trees |
Abstract | This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis. |
INPROC. | cstr:eursp95 [BC95] |
Author | |
Title | Optimising selection of units from speech databases for concatenative synthesis |
Booktitle | Proc. Eurospeech '95 |
Volume | 1 |
Address | Madrid, Spain |
Month | September |
Year | 1995 |
Pages | 581--584 |
Remarks | Summary: Detailed description of unit selection model, used features and context, concatenation join point optimisation. Description of weight optimising procedure: euclidian cepstral distance (very limited first attempt) on real-speech test sentences. Unit selection as used in CHATR. cited in [MCW98] |
INPROC. | cstr:ssml97 [STTI97] |
Author | |
Title | A Markup Language for Text-To-Speech Synthesis |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 1747--1750 |
Note | www [CSTR99] Electronic version: cstr/Sproat1997a.* |
Abstract | Text-to-speech synthesizers must process text, and therefore require some knowledge of text structure. While many TTS systems allow for user control by means of ad hoc `escape sequences', there remains to date no adequate and generally agreed upon system-independent standard for marking up text for the purposes of synthesis. The present paper is a collaborative effort between two speech groups aimed at producing such a standard, in the form of an SGML-based markup language that we call STML --- Spoken Text Markup Language. The primary purpose of this paper is not to present STML as a fait accompli, but rather to interest other TTS research groups to collaborate and contribute to the development of this standard. |
TECHREP. | cstr:festival97 [BT97a] |
Author | |
Title | The Festival Speech Synthesis System: System Documentation (1.1.1) |
Institution | Human Communication Research Centre |
Type | Technical Report |
Number | HCRC/TR-83 |
Month | January |
Year | 1997 |
Pages | 154 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/manual-1.1.1/festival-1.1.1.ps.gz |
Remarks | new version [BTC98] |
TECHREP. | cstr:festival98 [BTC98] |
Author | |
Title | The Festival Speech Synthesis System: System Documentation (1.3.1) |
Institution | Human Communication Research Centre |
Type | Technical Report |
Number | HCRC/TR-83 |
Month | December |
Year | 1998 |
Pages | 202 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/manual-1.3.1/festival_toc.html |
Remarks | updated version of [BTC98], new utterance structure as in [Tay99], multiple synthesizers |
TECHREP. | cstr:festivalarch98 [Tay99] |
Author | |
Title | The Festival Speech Architecture |
Type | Web Page |
Year | 1999 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/arch.html |
Abstract | This is a short document describing the way we represent speech and linguistic structures in Festival. There are three main types of structure:
|
INPROC. | Campbell_FactAffe_EURO97 [CYDH97] |
Author | |
Title | Factors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 2635--2638 |
Remarks | TO BE FOUND |
ARTICLE | Campbell_CHATR [Cam96] |
Author | |
Title | CHATR: A High-Definition Speech Re-Sequencing System |
Journal | Acoustical Society of America and Acoustical Society of Japan, Third Joint Meeting |
Address | Honolulu, HI |
Month | December |
Year | 1996 |
Remarks | TO BE FOUND |
BOOK | softeng [GJM91] |
Author | |
Title | Fundamentals of Software Engineering |
Publisher | Prentice--Hall |
Address | Englewood Cliffs, NJ |
Year | 1991 |
BOOK | boehm [Boe89] |
Author | |
Title | Software risk management |
Publisher | IEEE Computer Society Press |
Address | Washington |
Year | 1989 |
BOOK | Szyperski98 [Szy98] |
Key | Szperski |
Author | |
Title | Component Software: Beyond Object-Oriented Programming |
Publisher | ACM Press and Addison-Wesley |
Year | 1998 |
Address | New York, NY |
Annotate | An excellent overview of component-based programming. Many references. |
BOOK | booch [Boo94] |
Author | |
Title | Object-Oriented Analysis and Design with Applications |
Edition | 2nd |
Publisher | Benjamin--Cummings |
Address | Redwood City, Calif. |
Year | 1994 |
BOOK | omt [RBP+91] |
Author | |
Title | Object-Oriented Modeling and Design |
Publisher | Prentice--Hall |
Address | Englewood Cliffs, NJ |
Year | 1991 |
BOOK | ivar [Jac95b] |
Author | |
Title | Object-Oriented Software Engineering: a Use Case driven Approach |
Publisher | Addison--Wesley |
Address | Wokingham, England |
Year | 1995 |
UNPUBLISHED | uml-www [Sof97] |
Key | Rational |
Author | |
Title | Unified Modeling Language, version 1.1 |
Month | September |
Year | 1997 |
Note | Online documentation1 |
BOOK | DuCharme99 [DuC99] |
Author | |
Title | XML: the annotated specification |
Publisher | Prentice-Hall PTR |
Address | Upper Saddle River, NJ 07458, USA |
Pages | xix + 339 |
Year | 1999 |
Isbn | 0-13-082676-6 |
Series | The Charles F. Goldfarb series on open information management |
Keywords | XML (Document markup language); Database management. |
MISC | XML [Cov00] |
Key | XML |
Title | The XML Cover Pages |
Author | |
Publisher | OASIS, Organization for the Advancement of Structured Information Standards |
Howpublished | WWW page |
Year | 2000 |
url | http://www.oasis-open.org/cover/xml.html |
Note | http://www.oasis-open.org/cover/xml.html |
Abstract | Extensible Markup Language (XML) is descriptively identified as "an extremely simple dialect [or 'subset'] of SGML" the goal of which "is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML," for which reason "XML has been designed for ease of implementation, and for interoperability with both SGML and HTML." |
Remarks | Interesting links (among a wealth of introductory as well as detailed information): XML Metadata Interchange Format (XMI) - Object Management Group (OMG) http://www.oasis-open.org/cover/xmi.html. The design of the XML Metadata Interchange Format (XMI) represents an extremely important initiative. It has a goal of unifying XML and related W3C specifications with several object/component modeling standards, as well as with STEP schemas, and more. Particularly, it would "combine the benefits of the web-based XML standard for defining, validating, and sharing document formats on the web with the benefits of the object-oriented Unified Modeling Language (UML), a specification of the Object Management Group (OMG) that provides application developers a common language for specifying, visualizing, constructing, and documenting distributed objects and business models." Extensible User Interface Language (XUL) http://www.oasis-open.org/cover/xul.html "XUL stands for 'extensible user interface language'. It is an XML-based language for describing the contents of windows and dialogs. XUL has language constructs for all of the typical dialog controls, as well as for widgets like toolbars, trees, progress bars, and menus." User Interface Markup Language (UIML) http://www.oasis-open.org/cover/uiml.html The User Interface Markup Language (UIML) "allows designers to describe the user interface in generic terms, and then use a style description to map the interface to various operating systems (OSs) and appliances. Thus, the universality of UIML makes it possible to describe a rich set of interfaces and reduces the work in porting the user interface to another platform (e.g., from a graphical windowing system to a hand-held appliance) to changing the style description." See the separate document. XML Application Environments, Development Toolkits, Conversion http://www.oasis-open.org/cover/publicSW.htm\#xmlTestbed XML Testbed. An XML application environment written in Java. From Steve Withall. ..."uses an XML configuration file to define the (Swing-based) user interface; includes its own non-validating XML parser (though it can use any SAX parser instead), a nascent XSL engine (to the old submission standard - just in time to be out of date), and a few other odds and ends." http://www.w3.org/XML/1998/08withall/ http://www.w3.org/XML/1998/08withall/xt-beta-1-980816.zip http://www.w3.org/XML/1998/08withall/MontrealSlides/XXXIntroduction.html |
ARTICLE | Abrams:1999:UAI [APB+99] |
Author | |
Title | UIML: an appliance-independent XML user interface language |
Journal | Computer Networks (Amsterdam, Netherlands: 1999) |
Volume | 31 |
Number | 11--16 |
Pages | 1695--1708 |
Day | 17 |
Month | May |
Year | 1999 |
Coden | ???? |
Issn | 1389-1286 |
Bibdate | Fri Sep 24 19:43:29 MDT 1999 |
url | http://www.elsevier.com/cas/tree/store/comnet/sub/1999/31/11-16/2170.pdf |
Remarks | TO BE FOUND |
BOOK | Chauvet:1999:CTC [Cha99] |
Author | |
Title | Composants et transactions: COMMTS, CorbaOTS, JavaEJB, XML |
Publisher | Eyrolles: Informatiques magazine |
Address | Paris, France |
Pages | v + 274 |
Year | 1999 |
Isbn | 2-212-09075-7 |
Lccn | ???? |
Bibdate | Tue Sep 21 10:27:35 MDT 1999 |
Series | Collection dirigée par Guy Hervier |
Alttitle | Composants et transactions: Corba/OTS, EJB/JTS, COM/MTS: comprendre l'architecture des serveurs d'applications |
Annote | Titre de couv.: ``Composants et transactions: Corba/OTS, comprendre l'architecture des serveurs d'applications''. Bibliogr.: p. 267-269. |
Keywords | Conception orienté objets (informatique).; Objet composant, Modeles d'.; Javabeans. |
Remarks | TO BE FOUND |
MISC | anasyn:www [AS00] |
Key | AS |
Title | Analysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou |
Howpublished | WWW page |
Year | 2000 |
url | http://www.ircam.fr/anasyn/ |
pub-url | http://www.ircam.fr/anasyn/listePublications/index.html |
Note | http://www.ircam.fr/anasyn/ |
MISC | anasyn:oldwww [AS99] |
Key | AS |
Title | Analysis--Synthesis Team / Équipe Analyse--Synthèse, IRCAM---Centre Georges Pompidou |
Howpublished | WWW page |
Year | 1999 |
url | http://www.ircam.fr/equipes/analyse-synthese/ |
pub-url | http://www.ircam.fr/equipes/analyse-synthese/listePublications/index.html |
Note | http://www.ircam.fr/equipes/analyse-synthese/ |
INPROC. | PEET981 [Pee98] |
Author | |
Title | Analyse-Synthèse des sons musicaux par la méthode PSOLA |
Year | 1998 |
Address | Agelonde (France) |
Month | May |
INPROC. | PEET983 [PR98] |
Author | |
Title | Sinusoidal versus Non-Sinusoidal Signal Characterisation |
Year | 1998 |
Address | Barcelona |
Month | November |
Annote | (Workshop on Digital Audio Effects) |
INPROC. | PEET991 [PR99b] |
Author | |
Title | SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 1999 |
Address | Beijing |
Month | October |
INPROC. | PEET992 [PR99a] |
Author | |
Title | Non-Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum |
Year | 1999 |
Address | Orlando |
Month | November |
INPROC. | OM97 [AAFH97] |
Author | |
Title | An Object Oriented Visual Environment For Musical Composition |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 1997 |
Address | Thessaloniki, Greece |
url | http://www.ircam.fr/equipes/repmus/RMPapers/Assayag97/index.html |
bib-url | http://www.ircam.fr/equipes/repmus/RMPapers/ |
INPROC. | OM98 [AADR98] |
Author | |
Title | Objects, Time and Constraints in OpenMusic |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 1998 |
Address | Ann Arbor, Michigan |
Month | October |
url | http://www.ircam.fr/equipes/repmus/RMPapers/ICMC98a/OMICMC98.html |
bib-url | http://www.ircam.fr/equipes/repmus/RMPapers/ |
ARTICLE | OM99 [ARL+99b] |
Author | |
Title | Computer Assisted Composition at Ircam: PatchWork & OpenMusic |
Journal | Computer Music Journal |
Year | 1999 |
Volume | 23 |
Number | 3 |
url | http://www.ircam.fr/equipes/repmus/RMPapers/CMJ98/index.html |
bib-url | http://www.ircam.fr/equipes/repmus/RMPapers |
ARTICLE | OM99-short [ARL+99a] |
Author | |
Title | Computer Assisted Composition at Ircam: PatchWork & OpenMusic |
Journal | Computer Music Journal |
Month | Fall |
Year | 1999 |
Volume | 23 |
Number | 3 |
url | http://www.ircam.fr/equipes/repmus/RMPapers/CMJ98/index.html |
bib-url | http://www.ircam.fr/equipes/repmus/RMPapers |
INPROC. | OM2000 [AAS00c] |
Author | |
Title | High Level Musical Control of Sound Synthesis in OpenMusic |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Year | 2000 |
Address | Berlin |
Month | August |
INPROC. | OM2000-short [AAS00a] |
Author | |
Title | High Level Musical Control of Sound Synthesis in OpenMusic |
Booktitle | Proc. ICMC |
Address | Berlin |
Year | 2000 |
INPROC. | OM2000-sshort [AAS00b] |
Author | |
Title | High Level Musical Control of Sound Synthesis in OpenMusic |
Booktitle | Proc. ICMC |
Year | 2000 |
INPROC. | sdif-ext2000 [SW00] |
Author | |
Title | Extensions and Applications of the SDIF Sound Description Interchange Format |
Booktitle | Proceedings of the International Computer Music Conference |
Month | August |
Year | 2000 |
Address | Berlin |
BOOK | moore89 [Moo89] |
Author | |
Title | An Introduction to the Psychology of Hearing |
Publisher | Academic Press Limited |
Edition | 3rd |
Year | 1989 |
Remarks | cited in [MCW98]: masking effects |
INPROC. | psy:susini97 [SMW97] |
Author | |
Title | Caractérisation perceptive des bruits de véhicules |
Booktitle | Actes du 4ème Congrès Français d'Acoustique |
Publisher | Société Française d'Acoustique |
Month | April |
Year | 1997 |
Address | Marseille |
INPROC. | psy:faure97 [FM97] |
Author | |
Title | Comparaison de profils sémantiques et de l'espace perceptif de timbres musicaux |
Booktitle | Actes du 4ème Congrès Français d'Acoustique |
Publisher | Société Française d'Acoustique |
Month | April |
Year | 1997 |
Address | Marseille |
url | http://mediatheque.ircam.fr/articles/textes/Faure97a/ |
Remarks | Mapping of semantic profiles (letting subjects choose descriptive words for timbre) to perceptual dimensions. Some references: Faure96, Grey77, Krimphoff94, Krumhansl89, McAdams95, Tversky77 |
Abstract | The purpuse of this study is to compare semantical profiles and perceptual dimensions of musical timbre. In a previous experiment, we extracted 23 most often used verbal attributes from spontaneous verbalizations describing similarities and differences between pairs of timbres and we tried to compare their use with the relative positions of timbres along each perceptual dimension. In this experiment, we used a VAME paradigm to test more quantitatively these verbal attributes. 12 synthetic sounds were presented and rated on each of the 23 unipolar semantic scales. Several distances (ether euclidien or from Tversky's model of similarity) between timbres were then calculated and the MDS semantical models obtained were compared to perceptual one. The structure of semantical and perceptual models differed a lot and the correlations with the semantical scales leads us to prefer a model in two dimensions without specificities derived from a distance directly obtained from Tversky's mod |
INPROC. | beauchamp95 [BHM95] |
Author | |
Title | Musical Sounds, Data Reduction, and Perceptual Control Parameters |
Booktitle | Program for SMPC95, Society for Music Perception and Cognition |
Publisher | Center for New Music and Audio Technologies (CNMAT) |
Address | Univ. Calif. Berkeley |
Pages | 8--9 |
Year | 1995 |
bib-url | http://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html |
Remarks | TO BE FOUND! |
ARTICLE | beauchamp98 [Bea98] |
Author | |
Title | Methods for measurement and manipulation of timbral physical correlates |
Booktitle | J. Acoust. Soc. Am. |
Year | 1998 |
Volume | 103 |
Part | Pt. 2 |
Pages | 2966 |
Number | 5 |
bib-url | http://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html |
Remarks | TO BE FOUND! |
ARTICLE | horner98 [YH] |
Author | |
Title | Hybrid Sampling-Wavetable Synthesis with Genetic Algorithms |
Booktitle | Journal of the Audio Engineering Society |
Volume | 45 |
Pages | 316--330 |
Number | 5 |
bib-url | http://www.cs.ust.hk/faculty/horner/subpage/pubs.html |
journal-url | http://www.aes.org/journal/toc/may97.html |
Remarks | To BE FOUND! high quality sort-of-concatenative instrument synthesis? |
Abstract | A combination of hybrid sampling and wavetable synthesis for matching acoustic instruments is demonstrated using genetic algorithm optimization. Tone sampling is used for the critical attack portion and wavetable synthesis is used to match the more gradually changing sustain and decay. A hybrid sampling wavetable performs a smooth crossfade transition. This method has been used to synthesize piano, harp, glockenspiel, and temple block tones. |
ARTICLE | horner96 [CH] |
Author | |
Title | Group Synthesis with Genetic Algorithms |
Booktitle | Journal of the Audio Engineering Society |
Volume | 44 |
Number | 3 |
Pages | 130--147 |
bib-url | http://www.cs.ust.hk/faculty/horner/subpage/pubs.html |
journal-url | http://www.aes.org/journal/toc/march.html |
Abstract | Musical sounds can be efficiently synthesized using an automatic genetic algorithm to decompose musical instrument tones into group synthesis parameters. By separating the data into individual matrices, a high degree of data compression with low computational cost is achieved. |
INPROC. | chandra98 [Cha98] |
Author | |
Title | Compositional experiments with concatenating distinct waveform periods while changing their structural properties |
Booktitle | SEAMUS'98 |
Publisher | School of Music, University of Illinois |
Address | Urbana, IL |
Month | April |
Year | 1998 |
url | http://cmp-rs.music.uiuc.edu/people/arunc/miranda/seamus98/index.htm |
ps-url | http://cmp-rs.music.uiuc.edu/people/arunc/miranda/seamus98/pre.ps |
Note | Available online2 |
Abstract | wigout is a sound-synthesis program, written in C and running under Unix and 32-bit Intel systems. The premise of the program is to allow the composer to compose the waveform with which she composes. Thus, sound is not a building-block with which one composes, but the subject matter of composition. The composer defines a waveform state, consisting of an arbitrary number of segments. Each segment is similar to (but not identical with) 1) a sine wave; 2) a square wave; 3) a triangle wave; or 4) a sawtooth wave. The composer stipulates the duration for which the sound is to last, and then the waveform state (which is on the order of a few milliseconds long) is iterated until the desired duration is reached. Upon each iteration, each segment changes itself by a specified amount. The resulting sound is the result of many independent changes in the waveform's segments. Up till now, five compositions have been written using wigout, for tape alone, and for tape and performers. |
ARTICLE | beauchamp96 [BH] |
Author | |
Title | Piecewise Linear Approximation of Additive Synthesis Envelopes: A Comparison of Various Methods |
Booktitle | Computer Music Journal |
Volume | 20 |
Pages | 72--95 |
Number | 2 |
bib-url | http://cmp-rs.music.uiuc.edu/people/beauchamp/publist.html |
ARTICLE | wakefield96 [PW96] |
Author | |
Title | A High Resolution Time--Frequency Representation for Musical Instrument Signals |
Journal | J. Acoust. Soc. Am. |
Volume | 99 |
Number | 4 |
Pages | 2382--2396 |
Year | 1996 |
bib-url |
INPROC. | wakefield98 [Wak98a] |
Author | |
Title | Time--Pitch Representations: Acoustic Signal Processing and Auditory Representations |
Booktitle | Proceedings of the IEEE Intl. Symp. on Time--Frequency/Time--Scale |
Year | 1998 |
Address | Pittsburgh |
INPROC. | wakefield98-short [Wak98b] |
Author | |
Title | Time--Pitch Representations: Acoustic Signal Processing and Auditory Representations |
Booktitle | Proc. IEEE Intl. Symp. Time--Frequency/Time--Scale |
Year | 1998 |
Address | Pittsburgh |
INPROC. | loris2000a [FHC00d] |
Author | |
Title | Transient Preservation under Transformation in an Additive Sound Model |
Booktitle | Proceedings of the International Computer Music Conference |
Address | Berlin |
Year | 2000 |
INPROC. | loris2000a-short [FHC00b] |
Author | |
Title | Transient Preservation under Transformation in an Additive Sound Model |
Booktitle | Proc. ICMC |
Address | Berlin |
Year | 2000 |
INPROC. | loris2000b [FHC00c] |
Author | |
Title | A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling |
Booktitle | Proc. ICMC |
Address | Berlin |
Year | 2000 |
INPROC. | loris2000b-short [FHC00a] |
Author | |
Title | A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling |
Booktitle | Proc. ICMC |
Address | Berlin |
Year | 2000 |
INPROC. | sms97 [SBHL97b] |
Author | |
Title | Integrating Complementary Spectral Models in the Design of a Musical Synthesizer |
Booktitle | Proceedings of the International Computer Music Conference |
Year | 1997 |
Address | Tessaloniki |
INPROC. | sms97-short [SBHL97c] |
Author | |
Title | Integrating Complementary Spectral Models in the Design of a Musical Synthesizer |
Booktitle | Proc. ICMC |
Year | 1997 |
Address | Tessaloniki |
ARTICLE | sms90 [SS90] |
Author | |
Title | Spectral Modeling Synthesis: a Sound Analysis/Synthesis System Based on a Deterministic plus Stochastic Decomposition |
Journal | Computer Music Journal |
Year | 1990 |
Volume | 14 |
Number | 4 |
Pages | 12--24 |
ARTICLE | beauchamp93 [Bea93a] |
Author | |
Title | Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds |
Journal | Proceedings of the Audio Engineering Society |
Year | 1993 |
INPROC. | beauchamp93-short [Bea93b] |
Author | |
Title | Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds |
Booktitle | Proc. AES |
Year | 1993 |
BOOK | speechsyn96 [vSHOS96] |
Editor | |
Title | Progress in Speech Synthesis |
Publisher | Springer-Verlag |
Address | New York |
Year | 1996 |
Isbn | 0-387-94701-9 |
amazon-url | http://www.amazon.de/exec/obidos/ASIN/0387947019 |
Remarks | van Santen Author Links: http://www.bell-labs.com/project/tts/BOOK.html, Springer Heidelberg: http://www.springer.de/cgi-bin/search-book.pl?isbn=0-387-94701-9, Springer New-York: http://www.springer-ny.com/catalog/np/may96np/DATA/0-387-94701-9.html |
ARTICLE | psola92 [VMT92] |
Key | synthesis |
Author | |
Title | Voice transformation using PSOLA technique |
Journal | speech |
Year | 1992 |
Month | June |
Volume | 11 |
Number | 2-3 |
Pages | 189--194 |
BOOK | chomsky68sound [CH68] |
Author | |
Title | The Sound Pattern of English |
Publisher | Harper & Row |
Address | New York, NY |
Year | 1968 |
ARTICLE | bailly1991 [BLS91] |
Author | |
Title | Formant trajectories as audible gestures: an alternative for speech synthesis. |
Journal | Journal of Phonetics |
Year | 1991 |
Volume | 19 |
Pages | 9--23 |
INPROC. | soong88 [SR88] |
Author | |
Title | On the use of Instantaneous and Transitional Spectral Information in Speaker Recognition |
Booktitle | IEEE Transactions on Acoustics, Speech and Signal Processing |
Volume | 36 |
Year | 1988 |
Pages | 871--879 |
Keywords | derivative of cepstrum |
Remarks | cited in [MD97a] |
INPROC. | griffin88 [GL88] |
Author | |
Title | Multiband Excitation Vocoder |
Booktitle | IEEE Transactions on Acoustics, Speech and Signal Processing |
Volume | 36 |
Year | 1988 |
Pages | 1123--1235 |
Keywords | robust cepstrum by sinusoidal weighting |
Remarks | cited in [MD97a] |
INPROC. | allessandro95 [dM95] |
Author | |
Title | Automatic pitch contour stylization using a model of tonal perception |
Booktitle | Computer Speech and Language |
Year | 1995 |
Pages | 257--288 |
Keywords | perceptual stylization, based on a model of tonal perception |
Remarks | cited in [MD97a] |
INPROC. | traber92 [Tra92] |
Author | |
Title | F0 Generation with a Database of Natural F0 Patterns and with a Neural Network |
Booktitle | Talking Machines: Theories, Models, and Designs |
Editor | |
Publisher | North Holland |
Year | 1992 |
Pages | 287--304 |
Remarks | cited in [MD97a]: machine learning techniques: multilayer perceptrons |
INPROC. | sagisaka92 [SK92] |
Author | |
Title | Optimization of Intonation Control Using Statistical F0 Resetting Characteristics |
Booktitle | Proceedings of the International Conference on Acoustics |
Volume | 2 |
Publisher | Speech and Signal Processing |
Year | 1992 |
Pages | 49--52 |
Remarks | cited in [MD97a]: machine learning techniques: linear regression |
INPROC. | hirschberg91 [Hir91] |
Author | |
Title | Using Text Analysis to Predict Intonational Boundaries |
Booktitle | Proceedings of Eurospeech |
Location | Genova |
Year | 1991 |
Pages | 1275--1278 |
INPROC. | moebius93 [MPH93] |
Author | |
Title | Analysis and Synthesis of German F0 Contours by Means of Fujisaki's Model |
Booktitle | Speech Communication |
Volume | 13 |
Year | 1993 |
Pages | 53--61 |
INPROC. | sagisaka88 [Sag88] |
Author | |
Title | Speech synthesis by rule using an optimal selection of non-uniform synthesis units |
Booktitle | Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing |
Year | 1988 |
Pages | 679 |
Remarks | (origin of unit selection?), cited in [MCW98]: since the late 1980's, selection-based concatenative synthesis from large databases has received increased interest as a potential improvement upon fixed diphone inventories. TO BE FOUND |
INPROC. | wang93 [WCIS93] |
Author | |
Title | Tree-based unit selection for English speech synthesis |
Booktitle | Proc. of the Int'l Conf. on Acoustics, Speech, and Signal Processing |
Year | 1993 |
Pages | 191--194 |
Remarks | cited in [MCW98, CM98]: clustering and decision trees. TO BE FOUND |
INPROC. | nakajima94 [Nak94] |
Author | |
Title | Automatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering |
Booktitle | Speech Communication |
Volume | 14 |
Month | September |
Year | 1994 |
Pages | 313 |
Remarks | cited in [MCW98, CM98]: clustering and decision trees. TO BE FOUND |
PHDTHESIS | donovan96 [Don96] |
Author | |
Title | Trainable Speech Synthesis |
Type | PhD thesis |
School | Cambridge University |
Year | 1996 |
Remarks | cited in [MCW98]: Mahalanobis distance |
INPROC. | huang96 [HAea96] |
Author | |
Title | Whistler: A trainable text-to-speech system |
Booktitle | Proc. of the Int'l Conf. on Spoken Language Processing |
Year | 1996 |
Pages | 2387--2390 |
Remarks | cited in [MCW98]: decision trees for speech synthesis |
INPROC. | karaali96 [KCG96] |
Author | |
Title | Speech Synthesis with Neural Networks |
Booktitle | Proc. of World Congress on Neural Networks |
Month | September |
Year | 1996 |
Pages | 45--50 |
Remarks | cited in [MCW98]: data driven direct mapping with NN |
INPROC. | tuerk93 [TR] |
Author | |
Title | Speech synthesis using artificial neural networks trained on cepstral coefficients |
Booktitle | Proc. EUROSPEECH |
Pages | 1713--1716 |
Remarks | cited in [MCW98]: data driven direct mapping with NN |
BOOK | quackenbush88 [QBC88] |
Author | |
Title | Objective Measures of Speech Quality |
Publisher | Prentice-Hall |
Address | Englewood Cliffs, NJ |
Year | 1988 |
Remarks | cited in [MCW98]: distance measures for coding |
INPROC. | nocerino85 [NSRK85] |
Author | |
Title | Comparative study of several distortion measures for speech recognition |
Booktitle | Speech Communication |
Volume | 4 |
Year | 1985 |
Pages | 317--331 |
Remarks | cited in [MCW98]: distance measures for ASR |
INPROC. | asp:icassp88 [HJ88] |
Author | |
Title | Optimization of perceptually-based ASR front-end |
Booktitle | Proceedings of the International Conference on Acoustics, Speech, and Signal Processing |
Year | 1988 |
Pages | 219 |
Remarks | cited in [MCW98]: distance measures for ASR |
INPROC. | ghitza97 [GS97] |
Author | |
Title | On the perceptual distance between two speech segments |
Booktitle | Journal of the Acoustical Society of America |
Year | 1997 |
Volume | 101 |
Pages | 522--529 |
Number | 1 |
Remarks | cited in [MCW98]: distance measures in general |
INPROC. | hansen98 [HC98] |
Author | |
Title | An auditory-based distortion measure with application to concatenative speech synthesis |
Booktitle | IEEE Trans. on Speech and Audio Processing |
Volume | 6 |
Month | September |
Year | 1998 |
Pages | 489--495 |
Remarks | cited in [MCW98]: distance measures for concatenative speech synthesis |
INPROC. | asp:itsa94 [HM94] |
Author | |
Title | RASTA processing of speech |
Booktitle | IEEE Transactions on Speech and Acoustics |
Volume | 2 |
Month | October |
Year | 1994 |
Pages | 587--589 |
Remarks | cited in [MCW98] |
BOOK | edwards93 [Edw93] |
Author | |
Title | An Introduction to Linear Regression and Correlation |
Publisher | W. H. Freeman and Co |
Address | San Francisco |
Year | 1993 |
Remarks | cited in [MCW98]: Fisher transform |
INPROC. | Ding_OptiUnit_EURO97 [DC97] |
Author | |
Title | Optimising Unit Selection with Voice Source and Formants in the CHATR Speech Synthesis System |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 537--540 |
Remarks | To BE FOUND! |
MASTER. | diemo98 [Sch98c] |
Author | |
Title | Spectral Envelopes in Sound Analysis and Synthesis |
Type | Diplomarbeit Nr. 1622 |
School | Universität Stuttgart, Fakultät Informatik |
Address | Stuttgart, Germany |
Month | June |
Year | 1998 |
url | http://www.ircam.fr/anasyn/schwarz/da/ |
official-url | http://www.informatik.uni-stuttgart.de/cgi-bin/ncstrl_rep_view.pl?/inf/ftp/pub/library/medoc.ustuttgart_fi/DIP-1622/DIP-1622.bib |
Abstract | In this project, Spectral Envelopes in Sound Analysis and Synthesis, various methods for estimation, representation, file storage, manipulation, and application of spectral envelopes to sound synthesis were evaluated, improved, and implemented. A prototyping and testing environment was developed, and a function library to handle spectral envelopes was designed and implemented. For the estimation of spectral envelopes, after defining the requirements, the methods LPC, cepstrum, and discrete cepstrum were examined, and also improvements of the discrete cepstrum method (regularization, stochastic (or probabilistic) smoothing, logarithmic frequency scaling, and adding control points). An evaluation with a large corpus of sound data showed the feasibility of discrete cepstrum spectral envelope estimation. After defining the requirements for the representation of spectral envelopes, filter coefficients, spectral representation, break-point functions, splines, formant representation, and high resolution matching pursuit were examined. A combined spectral representation with indication of the regions of formants (called fuzzy formants) was defined to allow for integration of spectral envelopes with precise formant descriptions. For file storage, new data types were defined for the Sound Description Interchange Format (SDIF) standard. Methods for manipulation were examined, especially interpolation between spectral envelopes, and between spectral envelopes and formants, and other manipulations, based on primitive operations on spectral envelopes. For sound synthesis, application of spectral envelopes to additive synthesis, and time-domain or frequency-domain filtering have been examined. For prototyping and testing of the algorithms, a spectral envelope viewing program was developed. Finally, the spectral envelope library, offering complete functionality of spectral envelope handling, was developed according to the principles of software engineering. |
MASTER. | diemo98-short [Sch98a] |
Author | |
Title | Spectral Envelopes in Sound Analysis and Synthesis |
Type | Diplomarbeit Nr. 1622 |
School | Universität Stuttgart, Fakultät Informatik |
Address | Stuttgart, Germany |
Year | 1998 |
MASTER. | diemo98-sshort [Sch98b] |
Author | |
Title | Spectral Envelopes in Sound Analysis and Synthesis |
Type | Diplomarbeit |
School | Universität Stuttgart, Informatik |
Year | 1998 |
BOOK | bookbeauchamp [Bea00] |
Editor | |
Title | The Sound of Music |
Publisher | Springer |
Address | New York |
Year | 2000 |
INBOOK | bookbeauchamp-specenv [RSb] |
Author | |
Title | Spectral Envelopes and Additive+Residual Analysis-Synthesis |
Note | In J. Beauchamp, ed. The Sound of Music. Springer, New York, to be published 2000 |
INBOOK | bookbeauchamp-specenv-short [RSa] |
Author | |
Title | Spectral Envelopes and Additive+Residual Analysis-Synthesis |
Note | In J. Beauchamp, ed. The Sound of Music. Springer, N.Y., to be published |
INPROC. | holmes83 [Hol83a] |
Author | |
Title | Formant synthesizers: Cascade or Parallel |
Booktitle | Speech Communication |
Year | 1983 |
Volume | 2 |
Pages | 251--273 |
INPROC. | holmes83-short [Hol83b] |
Author | |
Title | Formant synthesizers: Cascade or Parallel |
Booktitle | Speech Communication |
Volume | 2 |
Year | 1983 |
BOOK | hamming77 [Ham77b] |
Author | |
Title | Digital Filters |
Publisher | Prentice--Hall |
Series | Signal Processing Series |
Address | Englewood Cliffs |
Year | 1977 |
BOOK | hamming77-short [Ham77a] |
Author | |
Title | Digital Filters |
Publisher | Prentice--Hall |
Series | Signal Processing Series |
Year | 1977 |
INPROC. | fft-2 [FRD93a] |
Author | |
Title | Performance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1 |
Booktitle | Proceedings of the 19th International Computer Music Conference |
Address | Waseda University Center for Scholarly Information |
Year | 1993 |
Publisher | International Computer Music Association |
url | http://cnmat.CNMAT.Berkeley.EDU/~adrian/FFT-1/FFT-1_ICMC93.html |
INPROC. | fft-2-short [FRD93b] |
Author | |
Title | Performance, Synthesis and Control of Additive Synthesis on a Desktop Computer Using FFT-1 |
Booktitle | Proc. ICMC |
Year | 1993 |
INPROC. | fft-3 [SBHL97d] |
Author | |
Title | Integrating complementary spectral models in the design of a musical synthesizer |
Booktitle | Proceedings of the International Computer Music Conference |
Year | 1997 |
url | http://www.iua.upf.es/~xserra/articles/spectral-models/ |
INPROC. | fft-3-short [SBHL97a] |
Author | |
Title | Integrating Complementary Spectral Models in the Design of a Musical Synthesizer |
Booktitle | Proc. ICMC |
Year | 1997 |
PHDTHESIS | marine-thesis [Oud98b] |
Author | |
Title | Étude du modèle ``sinusoïdes et bruit'' pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale |
Type | Thèse |
School | Ecole Nationale Supérieure des Télécommunications |
Address | Paris, France |
Month | November |
Year | 1998 |
PHDTHESIS | marine-thesis-short [Oud98a] |
Author | |
Title | Étude du modèle sinusoïdes et bruit pour le traitement de la parole. Estimation robuste de l'enveloppe spectrale |
Type | Thèse |
School | ENST |
Address | Paris |
Year | 1998 |
INPROC. | jmax99 [DCMS99] |
Author | |
Title | jMax Recent Developments |
Booktitle | Proceedings of the International Computer Music Conference |
Year | 1999 |
INPROC. | jmax99-short [DDMS99] |
Author | |
Title | jMax Recent Developments |
Booktitle | Proc. ICMC |
Year | 1999 |
INPROC. | jmax2000 [DSBO00b] |
Author | |
Title | The jMax Environment: An Overview of New Features |
Booktitle | Proceedings of the International Computer Music Conference |
Address | Berlin |
Year | 2000 |
INPROC. | jmax2000-short [DSBO00a] |
Author | |
Title | The jMax Environment: An Overview of New Features |
Booktitle | Proc. ICMC |
Address | Berlin |
Year | 2000 |
INPROC. | lemur95 [FHH95a] |
Author | |
Title | Lemur -- A Tool for Timbre Manipulation |
Booktitle | Proceedings of the International Computer Music Conference |
Pages | 158--161 |
Address | Banff |
Month | September |
Year | 1995 |
INPROC. | lemur95-short [FHH95b] |
Author | |
Title | Lemur -- A Tool for Timbre Manipulation |
Booktitle | Proc. ICMC |
Year | 1995 |
INPROC. | HRMP [GBM+96] |
Author | |
Title | Analysis of Sound Signals with High Resolution Matching Pursuit |
Booktitle | Proceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS) |
Year | 1996 |
Note | www [AS00] |
url | \url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/TFTS96/tfts96.ps.gz} |
INPROC. | HRMP2 [GDR+96] |
Author | |
Title | Sound Signal Decomposition using a High Resolution Matching Pursuit |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Location | Clear Water Bay, Hong-Kong |
Month | August |
Year | 1996 |
Note | www [AS00] |
abstract-url | \url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC96HRMP/abstract.txt} |
url | \url{http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC96HRMP/ICMC96HRMP.ps.gz} |
ARTICLE | fof [Rod84b] |
Author | |
Title | Time-Domain Formant-Wave-Function Synthesis |
Journal | Computer Music Journal |
Volume | 8 |
Number | 3 |
Month | Fall |
Year | 1984 |
Pages | 9--14 |
Note | reprinted from [Sim80] |
ARTICLE | fof-short [Rod84a] |
Author | |
Title | Time-Domain Formant-Wave-Function Synthesis |
Journal | Computer Music Journal |
Month | Fall |
Year | 1984 |
BOOK | fof2 [Sim80] |
Editor | |
Title | Spoken Language Generation and Understanding |
Publisher | D. Reidel Publishing Company |
Address | Dordrecht, Holland |
Year | 1980 |
ARTICLE | chant [RPB84b] |
Author | |
Title | The Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General |
Journal | Computer Music Journal |
Volume | 8 |
Number | 3 |
Month | Fall |
Year | 1984 |
Pages | 15--31 |
ARTICLE | chant-short [RPB84a] |
Author | |
Title | The Chant--Project: From the Synthesis of the Singing Voice to Synthesis in General |
Journal | Computer Music Journal |
Month | Fall |
Year | 1984 |
ARTICLE | chant2 [RPB85] |
Author | |
Title | CHANT: de la synthèse de la voix chantée à la synthèse en général |
Journal | Rapports de recherche IRCAM |
Address | Paris |
Year | 1985 |
Note | Available online3 |
MANUAL | chant-manual [Vir97] |
Author | |
Title | La Librairie CHANT: Manuel d'utilisation des fonctions en C |
Month | April |
Year | 1997 |
Note | Available online4 |
INPROC. | dcep1 [GR90] |
Author | |
Title | An Improved Cepstral Method for Deconvolution of Source--Filter Systems with Discrete Spectra: Application to Musical Sound Signals |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Address | Glasgow |
Month | September |
Year | 1990 |
Notes | dcep with cloud, some pictures, middle (3 pages) |
INPROC. | dcep2 [GR91b] |
Author | |
Title | Generalized Discrete Cepstral Analysis for Deconvolution of Source--Filter Systems with Discrete Spectra |
Booktitle | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
Address | New Paltz, New York |
Month | October |
Year | 1991 |
Notes | dcep with cloud, no pictures, short (2 pages) |
INPROC. | dcep3 [GR91c] |
Author | |
Title | Generalized Functional Approximation for Source--Filter System Modeling |
Booktitle | Proc. Eurospeech |
Address | Geneve |
Year | 1991 |
Pages | 1085--1088 |
Notes | power spectrum modeling, all pole, dcep with cloud, log frequency, many pictures |
INPROC. | dcep3-short [GR91a] |
Author | |
Title | Generalized Functional Approximation for Source--Filter System Modeling |
Booktitle | Proc. Eurospeech |
Year | 1991 |
INPROC. | marine1 [OCM97] |
Author | |
Title | Robust Estimation of the Spectral Envelope for ``Harmonics+Noise'' Models |
Booktitle | IEEE Workshop on Speech coding |
Address | Pocono Manor |
Month | September |
Year | 1997 |
INPROC. | marine97 [COM97] |
Author | |
Title | Spectral Envelope Estimation using a Penalized Likelihood Criterion |
Booktitle | IEEE ASSP Workshop on App. of Sig. Proc. to Audio and Acoust. |
Address | Mohonk |
Month | October |
Year | 1997 |
ARTICLE | dcep-reg [CM96] |
Author | |
Title | Regularization Techniques for Discrete Cepstrum Estimation |
Journal | IEEE Signal Processing Letters |
Volume | 3 |
Number | 4 |
Pages | 100--102 |
Month | April |
Year | 1996 |
INPROC. | xspect [RFL96] |
Author | |
Title | Xspect: a New Motif Signal Visualisation, Analysis and Editing Program |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Location | Hong Kong |
Month | August |
Year | 1996 |
Note | Available online5 |
MANUAL | xspect-manual [RF96] |
Author | |
Title | XSPECT: Introduction |
Month | January |
Year | 1996 |
Note | Available online6 |
INPROC. | hmm [DGR93a] |
Author | |
Title | Tracking of Partials for Additive Sound Synthesis Using Hidden Markov Models |
Note | Abstract7 |
Pages | 225--228 |
Booktitle | IEEE Trans. |
Year | 1993 |
Month | April |
INPROC. | hmm-short [DGR93b] |
Author | |
Title | Tracking of Partials for Additive Sound Synthesis Using Hidden Markov Models |
Pages | 225--228 |
Booktitle | IEEE Trans. |
Year | 1993 |
INPROC. | additive [Rod97b] |
Author | |
Title | Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models |
Booktitle | Proceedings of the IEEE Time--Frequency and Time--Scale Workshop (TFTS) |
Month | August |
Year | 1997 |
Note | Abstract8, PostScript9 www.ircam.fr/anasyn/listePublications/articlesRodet/TFTS97/TFTS97.ps.gz |
INPROC. | additive-short [Rod97a] |
Author | |
Title | Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models |
Booktitle | Proc. IEEE Time--Frequency/Time--Scale Workshop |
Year | 1997 |
MANUAL | additive-manual [Rod97c] |
Author | |
Title | The Additive Analysis--Synthesis Package |
Year | 1997 |
Note | Available online10 |
INPROC. | diphones [RL97b] |
Author | |
Title | The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Month | September |
Year | 1997 |
Address | Tessaloniki, Greece |
Note | Abstract11, PostScript12 www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97Diphone.ps.gz |
INPROC. | diphones-nourl [RL97c] |
Author | |
Title | The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Month | September |
Year | 1997 |
Address | Tessaloniki, Greece |
abstract-url | http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97DiphoneAbstract.html |
postscript-url | http://www.ircam.fr/anasyn/listePublications/articlesRodet/ICMC97/ICMC97Diphone.ps.gz |
INPROC. | diphones-short [RL97a] |
Author | |
Title | The Diphone Program: New Features, new Synthesis Methods and Experience of Musical Use |
Booktitle | Proc. ICMC |
Address | Tessaloniki |
Year | 1997 |
INPROC. | fft-1 [RD92] |
Author | |
Title | A new additive synthesis method using inverse Fourier transform and spectral envelopes |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Month | October |
Year | 1992 |
MANUAL | sdif-manual [Vir98] |
Author | |
Title | Sound Description Interchange Format (SDIF) |
Month | January |
Year | 1998 |
Note | Available online13 |
INPROC. | fts [DDPZ94] |
Author | |
Title | The IRCAM ``Real-Time Platform'': Evolution and Perspectives |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Location | Aarhus, Danemark |
Year | 1994 |
Note | Available online14 |
ARTICLE | fts-basics [Puc91b] |
Author | |
Title | FTS: A Real-Time Monitor for Multiprocessor Music Synthesis |
Journal | Computer Music Journal |
Volume | 15 |
Number | 3 |
Pages | 58--67 |
Month | Winter |
Year | 1991 |
Note | Available from15 |
ARTICLE | max [Puc91a] |
Author | |
Title | Combining Event and Signal Processing in the MAX Graphical Programming Environment |
Journal | Computer Music Journal |
Volume | 15 |
Number | 3 |
Pages | 68--77 |
Month | Winter |
Year | 1991 |
Note | Available from16 |
INPROC. | specenv-rod [RDP87b] |
Author | |
Title | Speech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions |
Booktitle | European Conference on Speech Tech. |
Location | Edinburgh |
Month | September |
Year | 1987 |
INPROC. | specenv-rod-short [RDP87a] |
Author | |
Title | Speech Analysis and Synthesis Methods Based on Spectral Envelopes and Voiced/Unvoiced Functions |
Booktitle | European Conf. on Speech Tech. |
Location | Edinburgh |
Year | 1987 |
INPROC. | control [FRD92b] |
Author | |
Title | Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware |
Booktitle | ICSPAT |
Location | San José |
Year | 1992 |
Note | Available online17 |
Notes | fft-1, fm, se better than BPF |
INPROC. | control-short [FRD92a] |
Author | |
Title | Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware |
Booktitle | ICSPAT |
Year | 1992 |
INPROC. | newposs [RDG95] |
Author | |
Title | New Possibilities in Sound Analysis and Synthesis |
Booktitle | ISMA |
Location | Dourdan |
Year | 1995 |
Note | Available online18 PostScript19 |
Notes | fft-1 + se, phys. models, ana/syn overview, farinelli |
INPROC. | farinelli [DGR94] |
Author | |
Title | A Virtual Castrato (!?) |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Location | Aarhus, Danemark |
Year | 1994 |
Note | Available online20 |
MANUAL | udi [WRD92] |
Author | |
Title | UDI 2.1---A Unified DSP Interface |
Year | 1992 |
Note | Available online21 |
MANUAL | pm [Gar94] |
Author | |
Title | Pm: A library for additive analysis/transformation/synthesis |
Month | July |
Year | 1994 |
Note | Available online22 |
INPROC. | escher [WSR98] |
Author | |
Title | ESCHER---Modeling and Performing composed Instruments in real-time |
Booktitle | IEEE Systems, Man, and Cybernetics Conference |
Location | San Diego |
Month | October |
Year | 1998 |
Note | To be published |
BOOK | nat [Hen98] |
Author | |
Title | Synthèse de la voix chantée par règles |
Month | July |
Year | 1998 |
Publisher | IRCAM |
Address | Paris, France |
Note | Rapport de stage D.E.A. Acoustique, Traitement de Signal et Informatique Appliqués à la Musique |
MISC | z [Mel97] |
Author | |
Title | The Z--Transform |
Note | Online tutorial23 |
Year | 1997 |
BOOK | dsp [OS75] |
Author | |
Title | Digital Signal Processing |
Year | 1975 |
Publisher | Prentice--Hall |
INBOOK | dspapp [Opp78] |
Editor | |
Chapter | Digital Processing of Speech |
Title | Applications of Digital Signal Processing |
Pages | 117--168 |
Year | 1978 |
Publisher | Prentice--Hall |
BOOK | dsp-intro [RH91] |
Author | |
Title | Signals and Systems for Speech and Hearing |
Year | 1991 |
Publisher | Academic Press |
Address | London |
BOOK | roads [Roa96] |
Author | |
Title | The Computer Music Tutorial |
Year | 1996 |
Publisher | MIT Press |
BOOK | grey80 [MG80] |
Author | |
Title | Linear Prediction of Speech |
Publisher | Springer |
Year | 1980 |
INPROC. | toeplitz [MP82] |
Author | |
Title | Efficient Solution of a Toeplitz--plus Hankel Coefficient Matrix System of Equations |
Booktitle | IEEE TASSP |
Volume | 30 |
Pages | 40--44 |
Month | February |
Year | 1982 |
BOOK | psycho [Zwi82] |
Author | |
Title | Psychoakustik |
Year | 1982 |
Publisher | Springer |
INPROC. | splinelpc [TAW97] |
Author | |
Title | Enhanced Modeling of Discrete Spectral Amplitudes |
Booktitle | IEEE Workshop on Speech coding |
Address | Pocono Manor |
Month | September |
Year | 1997 |
INCOLL. | ICS94 [vS94] |
Author | |
Title | Peak-insensitive non-parametric spectrum estimation |
Booktitle | Journal of time series analysis |
Year | 1994 |
Volume | 15 |
Number | 4 |
Pages | 429--452 |
ARTICLE | additive-idea [RM69] |
Author | |
Title | Analysis of musical-instrument tones |
Journal | Physics Today |
Volume | 22 |
Number | 2 |
Pages | 23--30 |
Month | February |
Year | 1969 |
INPROC. | splines [UAE93] |
Author | |
Title | B--Spline Signal Processing: Part I---Theory |
Volume | 41 |
Optnumber | 2 |
Pages | 821--833 |
Booktitle | IEEE Transactions on signal processing |
Year | 1993 |
MISC | speechana [Rob98] |
Author | |
Title | Speech Analysis |
Note | Online tutorial24 |
Year | 1998 |
ARTICLE | MultiscaleEdges [MZ92] |
Author | |
Title | Characterization of Signals from Multiscale Edges |
Journal | IEEE Trans. Pattern Anal. Machine Intell. |
Year | 1992 |
Volume | 40 |
Number | 7 |
Pages | 2464--2482 |
Month | July |
ARTICLE | Ridges [DEG+92] |
Author | |
Title | Asymptotic Wavelet and Gabor Analysis : Extraction of Instantaneous Frequency |
Year | 1992 |
Volume | 38 |
Number | 2 |
Pages | 644--664 |
Month | March |
ARTICLE | Ridges2 [GKM96] |
Author | |
Title | Characterization of Acoustic Signals Through Continuous Linear Time--Frequency Representations |
Year | 1996 |
Volume | 84 |
Number | 4 |
Pages | 561--585 |
Month | April |
BOOK | mallat [Mal97] |
Author | |
Title | A Wavelet Tour of Signal Processing |
Publisher | AP Professional |
Address | London |
Year | 1997 |
BOOK | chan [Cha95] |
Author | |
Title | Wavelet Basics |
Publisher | Kluwer Academic Publ. |
Address | Boston |
Year | 1995 |
BOOK | wavelets [Hub97] |
Author | |
Title | The World According to Wavelets: The Story of a Mathematical Technique in the Making |
Publisher | A K Peters Ltd |
Year | 1997 |
INBOOK | IBspline [AE] |
Author | |
Title | Wavelet analysis and its applications |
Chapter | Polynomial Spline and Wavelets |
Publisher | ??? |
Year | ??? |
Volume | 2 |
BOOK | instrument-character [vH54] |
Author | |
Title | On the Sensations of Tone as a Physiological Basis for the Theory of Music |
Publisher | Dover |
Address | New York |
Year | 1954 |
Note | Original title: [vH13] |
BOOK | helmholtz [vH13] |
Author | |
Title | Die Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik |
Publisher | Vieweg |
Address | Braunschweig |
Edition | 6th |
Year | 1913 |
BOOK | helmholtz-reprint [vH83] |
Author | |
Title | Die Lehre von den Tonempfindungen: als physiologische Grundlage für die Theorie der Musik |
Publisher | Georg Olms Verlag |
Address | Hildesheim |
Year | 1983 |
BOOK | clark-yallop [CY96] |
Author | |
Title | An Introduction to Phonetics and Phonology |
Publisher | Blackwell |
Address | Oxford |
Year | 1996 |
ARTICLE | prosody-tilt [Dog95] |
Author | |
Title | Phonetic Correlates of Word Stress |
Journal | AIMS Phonetik (Working Papers of the Department of Natural Language Processing) |
Volume | 2 |
Number | 2 |
Publisher | Institut für Maschinelle Sprachverarbeitung |
Location | Stuttgart, Germany |
Address | Stuttgart, Germany |
Year | 1995 |
Note | Contents25 |
BOOK | jackson1 [Jac95a] |
Author | |
Title | Software requirements & specifications : a lexicon of practice, principles, and prejudices |
Publisher | Addison--Wesley |
Address | Wokingham |
Year | 1995 |
BOOK | jackson2 [Jac83] |
Author | |
Title | System development |
Publisher | Prentice--Hall Intern. |
Address | Englewood Cliffs |
Year | 1983 |
Series | Prentice--Hall International series in computer science |
BOOK | nagl [Nag90] |
Author | |
Title | Softwaretechnik: methodisches Programmieren im Großen |
Publisher | Springer |
Address | Berlin |
Year | 1990 |
Series | Springer compass |
BOOK | sommerville [Som85] |
Author | |
Title | Software engineering |
Edition | 2nd |
Publisher | Addison--Wesley |
Address | Wokingham [u.a.] |
Year | 1985 |
Series | International computer science series |
BOOK | iau [Utt93] |
Author | |
Title | Lecture Notes in Object-Oriented Software Engineering |
Publisher | University of Kent at Canterbury |
Address | Canterbury, UK |
Year | 1993 |
ARTICLE | battiti94 [Bat94] |
Author | |
Title | Using the mutual information for selecting features in supervised neural net learning |
Journal | IEEE Transactions on Neural Networks |
Volume | 5 |
Number | 4 |
Pages | 537--550 |
Year | 1994 |
url | http://rtm.science.unitn.it/~battiti/battiti-publications.html |
BOOK | cart84 [BFOS84a] |
Author | |
Title | Classification and Regression Trees |
Publisher | Wadsworth and Brooks |
Address | Monterey, CA |
Year | 1984 |
Note | new edition [B+84]? |
Remarks | cited in [MCW98, CM98, BT97b] for CART, clustering, and decision trees |
BOOK | cart84-2 [BFOS84b] |
Author | |
Title | Classification and Regression Trees |
Year | 1984 |
Publisher | Wadsworth Publishing Company |
Address | Belmont, California, U.S.A. |
Series | Statistics/Probability Series |
Isbn-hard | 0534980546 (softcover) |
Isbn-soft | 0534980538 (hardcover) |
BOOK | cart93 [B+84] |
Author | |
Title | Classification and Regression Trees |
Publisher | Chapman & Hall |
Address | New York |
Year | 1984 |
Pages | 358 |
Note | new edition of [BFOS84a]? |
Isbn | 0-412-04841-8 |
url | http://www.crcpress.com/catalog/C4841.htm |
amazon-url | http://www.amazon.de/exec/obidos/ASIN/0412048418 |
Price | $44.95, DM 83.26 EUR 42.57 |
Remarks | TO BE FOUND |
ARTICLE | dubnov95 [DTC] |
Author | |
Title | Hearing Beyond the Spectrum |
Journal | Journal of New Music Research |
Volume | 24 |
Number | 4 |
pub-url | http://www.swets.nl/jnmr/vol24_4.html#dubnov24.4 |
Remarks | features: harmonicity, phase coherence, chorus. bispectral information. acoustic distortion (distance) measure (``concept of statistical divergence which is used for measuring the `similarity' between signals'', ``similarity classes with a good correspondence to the human acoustic perception'', ``generalization of acoustic distortion measure''). TO BE FOUND |
Abstract | In this work we focus on the problem of acoustic signals modeling and analysis, with particular interest in models that can capture the timbre of musical sounds. Traditional methods usually relate to several ``dimensions'' which represent the spectral properties of the signal and their change in time. Here we confine ourselves to the stationary portion of the sound signal, the analysis of which is generalized by incorporating polyspectral techniques. We suggest that by looking at the higher order statistics of the signal we obtain additional information not present in the standard autocorrelation or its Fourier related power-spectra. It is shown that over the bispectral plane several acoustically meaningful measures could be devised, which are sensitive to properties such as harmonicity and phase coherence among the harmonics. Effects such as reverberation and chorusing are demonstrated to be clearly detected by the above measures. In the second part of the paper we perform an information theoretic analysis of the spectral and bispectral planes. We introduce the concept of statistical divergence which is used for measuring the ``similarity'' between signals. A comparative matrix is presented which shows the similarity measure between several instruments based on spectral and bispectral information. The instruments group into similarity classes with a good correspondence to the human acoustic perception. The last part of the paper is devoted to acoustical modelling of the above phenomena. We suggest a simple model which accounts for some of the polyspectral aspects of musical sound discussed above. One of the main results of our work is generalization of acoustic distortion measure based on our model and which takes into account higher order statistical properties of the signal. |
INPROC. | dubnov97 [DR97] |
Author | |
Title | Statistical Modeling of Sound Aperiodicities |
Booktitle | Proceedings of the International Computer Music Conference (ICMC) |
Month | September |
Year | 1997 |
Address | Tessaloniki, Greece |
url | http://www.ircam.fr/equipes/analyse-synthese/listePublications/articlesDubnov |
PHDTHESIS | rochebois97 [Roc97] |
Author | |
Title | Méthodes d'analyse synthèse et représentations optimales des sons musicaux basées sur la réduction de données spectrales |
Month | December |
Year | 1997 |
School | Université Paris XI |
url | http://www.ief.u-psud.fr/~thierry/these/ |
Remarks | Principal components analysis of harmonic partials, gives sub-spaces as linear combinations of partials, i.e. timbral components. |
Abstract | principalL'analyse et la synthèse de sons et en particulier de sons musicaux a déjà fait l'objet de nombreuses recherches. Pour l'essentiel, ces recherches ont été menées dans deux objectifs : étudier et synthétiser les sons musicaux. Ces deux objectifs sont tout à fait conciliables et complémentaires. L'objet de cette thèse est une méthode d'analyse et de synthèse des sons musicaux basée sur une réduction de données. Une telle méthode permet d'obtenir une représentation optimale - au sens de la variance - des sons musicaux. Cette représentation est, à la fois un puissant outil pour l'étude du timbre musical, mais aussi, la base d'une forme de synthèse efficace. |
BOOK | fukunaga90 [Fuk90] |
Author | |
Title | Introduction to Statistical Pattern Recognition |
Publisher | Academic Press |
Edition | 2 |
Year | 1990 |
Remarks | cited in [CM98] for CART tree evaluation criterion. TO BE FOUND |
INPROC. | nock97 [NGY97] |
Author | |
Title | A Comparative Study of Methods for Phonetic Decision-Tree State Clustering |
Booktitle | Proc. Eurospeech '97 |
Volume | 1 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 111--114 |
Remarks | cited in [MCW98] for decision trees for speech recognition, [CM98] for CART tree evaluation criterion. TO BE FOUND |
MISC | tcts:www [TCTS99] |
Key | TCTS |
Title | TCTS (Circuit Theory and Signal Processing) Lab, Faculté Polytechnique de Mons |
Howpublished | WWW page |
Year | 1999 |
url | http://tcts.fpms.ac.be |
group-url | http://tcts.fpms.ac.be/synthesis/synthesis.html |
pub-url | http://tcts.fpms.ac.be/publications.html |
Note | http://tcts.fpms.ac.be |
INPROC. | tcts:euspico98 [DMD98] |
Author | |
Title | Comparaison of two different alignment systems: speech synthesis vs. Hybrid HMM/ANN |
Booktitle | Proc. European Conference on Signal Processing (EUSIPCO'98) |
Address | Greece |
Year | 1998 |
Pages | 1161--1164 |
Note | www [TCTS99], same content as [MDD98] (but less references) |
url | http://tcts.fpms.ac.be/publications/papers/1998/eusipco98_odfmtd.zip |
Abstract | In this paper we compared two different methods for phonetically labeling a French database. The first one is based on the temporal alignment of the speech signal on a high quality synthetic speech pattern and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a single speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outline the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage (hence no labeled database) is needed, while the classical HMM/ANN system allows easily multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic constitution of large phonetically and prosodically labeled speech databases based on using the synthetic speech segmentation tool in order to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems. All the experiments reported in this article related to the hybrid HMM/ANN system have been realized with the STRUT [3] software. |
INPROC. | tcts:tsd98 [DMP+98] |
Title | EULER: Multi-Lingual Text-to-Speech Project |
Pages | 27--32 |
Author | |
Booktitle | Proceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98 |
Year | 1998 |
Editor | |
Address | Brno, Czech Republic |
Month | September |
Publisher | Masaryk University Press |
Note | www [TCTS99]Electronic version: tcts/tsd98tdfmvppmmbarag.ps.* |
Remarks | modularity |
Abstract | Text-to-speech systems requires simultaneously an abstract linguistic analysis, an acoustic linguistic analysis and a final digital processing stage. The aim of the project presented in this paper is to obtain a set of text-to-speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. This project is an extension of the MBROLA projects. MBROLA is a speech synthesizer that is freely distributed for non-commercial purposes. A multi-lingual speech segmentation and prosody transplantation tool called MBROLIGN has also been developed and freely distributed. Other labs have also recently distributed for free important tools for speech synthesis like Festival from University o f Edinburgh or the MULTEXT project of the University de Provence. The purpose of this paper is to present the EULER project, which will try to integrate all these results, to Eastern European potential partners, so as to increase the dissemination of the important results of MBROLA and MBROLIGN projects and stimulate East/West collaboration on TTS synthesis. |
INPROC. | tcts:icslp98-fmodtd [MDD98] |
Author | |
Title | Phonetic Alignement : Speech Synthesis Based Vs. Hybrid HMM/ANN |
Booktitle | Proc. International Conference on Speech and Language Processing |
Address | Sidney, Australia |
Year | 1998 |
Pages | 1571--1574 |
Note | www [TCTS99], same content as [DMD98] (with more references) |
url | http://tcts.fpms.ac.be/publications/papers/1998/icslp98_fmodtd.zip |
Abstract | In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems. |
INPROC. | tcts:iscas97 [MD97a] |
Author | |
Title | Speech Synthesis for Text-To-Speech Alignment and Prosodic Feature Extraction |
Booktitle | Proc. ISCAS 97 |
Address | Hong-Kong |
Year | 1997 |
Pages | 2637--2640 |
Note | www [TCTS99] |
url | http://tcts.fpms.ac.be/publications/papers/1997/iscas97_fmtd.zip |
Remarks | Recent developments in prosody generation have highlighted the potential interest of machine learning techniques such as multilayer perceptrons [Tra92], linear regression techniques [SK92], classification and regression trees [Hir91], or statistical techniques [MPH93], based on the automatic analysis of large prosodically labeled corpora. Only the segmental features of the reference signal used in alignment. Assumption: the segmental and suprasegmental features are approximately uncorrelated. Keep only the perceptually relevant F0 cues, perceptual stylization, based on a model of tonal perception [alessandro95]. Robust cepstrum by sinusoidal weighting [GL88]. Derivative of cepstrum [SR88]. |
Abstract | The aim of this paper is to present a new and promising approach of the text--to--speech alignment problem. For this purpose, an original idea is developed : a high quality digital speech synthesizer is used to create a reference speech pattern used during the alignment process. The system has been used and tested to extract the prosodic features of read French utterances. The results show a segmentation error rate of about 8%. This system will be a powerful tool for the automatic creation of large prosodically labeled databases and for research on automatic prosody generation. |
INPROC. | tcts:eurosp97 [SDS97] |
Author | |
Title | Diphone Concatenation Using a Harmonic Plus Noise Model of Speech |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 613--616 |
Note | www [TCTS99]Electronic version: tcts/hnmconc.ps.* |
Remarks | Important! HNM (Marine) basis paper, pitch synchronous. Diphone smoothing in region of quasi-stationarity. Additive better for concatenation than PSOLA. References: [DG96] (non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation), [SLM95] (phase treatment, modifications), [Mac96] (non pitch synchronous harmonic modeling). |
Abstract | In this paper we present a high-quality text-to-speech system using diphones. The system is based on a Harmonic plus Noise (HNM) representation of the speech signal. HNM is a pitch-synchronous analysis-synthesis system but does not require pitch marks to be determined as necessary in PSOLA-based methods. HNM assumes the speech signal to be composed of a periodic part and a stochastic part. As a result, different prosody and spectral envelope modification methods can be applied to each part, yielding more natural-sounding synthetic speech. The fully parametric representation of speech using HNM also provides a straightforward way of smoothing diphone boundaries. Informal listening tests, using natural prosody, have shown that the synthetic speech quality is close to the quality of the original sentences, without smoothing problems and without buzziness or other oddities observed with other speech representations used for TTS. |
INPROC. | tcts:speechcomm96 [DG96] |
Author | |
Title | On the use of a hybrid harmonic/stochastic model for tts synthesis by concatenation |
Booktitle | Speech Communication |
Number | 19 |
Pages | 119--143 |
Year | 1996 |
Remarks | Cited in [SDS97] for non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation. TO BE FOUND |
INPROC. | macon-thesis96 [Mac96] |
Author | |
Title | Speech Synthesis Based on Sinusoidal Modeling |
Booktitle | PhD thesis |
Publisher | Georgia Institute of Technology |
Month | October |
Year | 1996 |
Remarks | Cited in [SDS97] for non pitch synchronous harmonic modeling. TO BE FOUND |
INPROC. | stylianou:eurospeech95 [SLM95] |
Author | |
Title | High Quality Speech Modification based on a Harmonic+Noise Model |
Booktitle | Proc. EUROSPEECH |
Year | 1995 |
Remarks | Cited in [SDS97] for phase treatment, modifications, maximum voice frequency. TO BE FOUND |
INPROC. | Malfrere_HighQual_EURO97 [MD97b] |
Author | |
Title | High Quality Speech Synthesis for Phonetic Speech Segmentation |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 2631--2634 |
INPROC. | Olivier_SimpAnd_EURO97 [vdVOPD+97] |
Author | |
Title | A Simple and Efficient Algorithm for the Compression of MBROLA Segment Databases |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 421--424 |
INPROC. | Dutoit_TheMbro_ICSLP96 [DPP+96] |
Author | |
Title | The MBROLA project: Towards a Set of High Quality Speech Synthesizers Free of Use for Non Commercial Purposes |
Booktitle | Proc. ICSLP '96 |
Address | Philadelphia, PA |
Month | October |
Year | 1996 |
Volume | 3 |
Pages | 1393--1396 |
INPROC. | Dutoit_HighQual_ICASSP94 [Dut94] |
Author | |
Title | High Quality Text-to-Speech Synthesis: a Comparison of four Candidate Algorithms |
Booktitle | Proc. ICASSP '94 |
Address | Adelaide, Austrailia |
Month | April |
Year | 1994 |
Pages | I--565--I--568 |
MISC | MPEG7:www [MPEG99] |
Key | MPEG |
Title | MPEG-7 ``Multimedia Content Description Interface'' Documentation |
Howpublished | WWW page |
Year | 1999 |
url | http://www.darmstadt.gmd.de/mobile/MPEG7 |
Note | http://www.darmstadt.gmd.de/mobile/MPEG7 |
Abstract | More and more audio-visual information is available in digital form, in various places around the world. Along with the information, people appear that want to use it. Before one can use any information, however, it will have to be located first. At the same time, the increasing availability of potentially interesting material makes this search harder. The question of finding content is not restricted to database retrieval applications; also in other areas similar questions exist. For instance, there is an increasing amount of (digital) broadcast channels available, and this makes it harder to select the broadcast channel (radio or TV) that is potentially interesting. In October 1996, MPEG (Moving Picture Experts Group) started a new work item to provide a solution for the urging problem of generally recognised descriptions for audio-visual content, which extend the limited capabilities of proprietary solutions in identifying content that exist today. The new member of the MPEG family is called ``Multimedia Content Description Interface'', or in short MPEG-7. The associated pages presented in the navigation tool shall provide you with the necessary information to learn more about MPEG-7. As MPEG in general is a dynamic and fast moving standardisation body, some documents and related information may be outdated quickly. We will make every effort to keep up with the MPEG pace - however, keep in mind that the Webpages may not always contain the newest information. |
MISC | MPEG7:audio-faq [Lin98] |
Author | |
Title | MPEG-7 Audio FAQ |
Howpublished | WWW page |
Year | 1998 |
url | http://www.meta-labs.com/mpeg-7/MPEG-7-aud-FAQ.shtml |
parent-url | http://www.meta-labs.com/mpeg-7-aud/ |
Note | moved to [TPMAS98] |
Abstract | The following is an unofficial FAQ for MPEG-7 Audio issues. It is not a complete document, and is intended to act as a supplement to the FAQ found in the MPEG-7 Context & Objectives document, N2326. |
Remarks | What are specific functionalities forseen for MPEG-7 audio? Although still an expanding list, we can envision indexing music, sound effects, and spoken-word content in the audio-only arena. MPEG-7 will enable query-by-example such as query-by-humming. In addition, audio tools play a large role in typical audio-visual content in terms of indexing film soundtracks and the like. If someone wants to manage a large amount of audio content, whether selling it, managing it internally, or making it openly available to the world, MPEG-7 is potentially the solution. What are the forseen elements of MPEG-7? MPEG-7 work is currently seen as being in three parts: Descriptors (D's), Description Schemes (DS's), and a Description Definition Language (DDL). Each is equally crucial to the entire MPEG-7 effort. Descriptors are the representations of low-level features, the fundamental qualities of audiovisual content which may range from statistical models of signal amplitude, to fundamental frequency of a signal, to an estimate of the number of sources present in a signal, to spectral tilt, to emotional content, to an explicit sound-effect model, to any number of concrete or abstract features. This is the place where the most involvement from the signal processing community is forseen. Note that not all of the descriptors need to be automatically extracted--the essential part of the standard is to establish a normalized representation and interpretation of the Descriptor. We are actively seeking input on what additional potential Descriptors would be useful. Description Schemes are structured combinations of Descriptors. This structure may be used to annotate a document, to directly express the structure of a document, or to create combinations of features which form a richer expression of a higher-level concept. For example, a radio segment DS may note the recording date, the broadcast date, the producer, the talent, and include pointers to a transcript. A classical music DS may encode the musical structures (and allow for exceptions) of a Sonata form. Various spectral and temporal Descriptors may be combined to form a DS appropriate for describing timbre or short sound effects. Any suggestions on other applications of DS's to Audio material are very welcome. The Description Definition Language is to be the mechanism which allows a great degreed flexibility to be included in MPEG-7. Not all documents will fit into a prescribed structure. There are fields (e.g. biomedical imagery) which would find the MPEG-7 framework very useful, but which lie outside of MPEG's scope. A solution provider may have a better method for combining MPEG-7 Descriptors than a normative description scheme. The DDL is to address all of these situations. While MPEG-4 seeks to have a unique and faithful reproduction of material, MPEG-7 foregoes some precision for the sake of identifying the "essential" features of the material (although many different representations are possible of the same material). What distinguishes it most from other material? What makes it similar? |
MISC | MPEG:audio-faq [TPMAS98] |
Author | |
Title | MPEG Audio FAQ Version 9 |
Howpublished | WWW page |
Year | 1998 |
Month | October |
Address | Atlantic City |
url | http://www.tnt.uni-hannover.de/project/mpeg/audio/faq |
Note | International Organisation for Standardisation, Organisation Internationale de Normalisation, Coding of Moving Pictures and Audio, ISO/IEC JTC1/SC29/WG11, N2431, http://www.tnt.uni-hannover.de/project/mpeg/audio/faq |
PHDTHESIS | levine:thesis [Lev98] |
Author | |
Title | Audio Representations for Data Compression and Compressed Domain Processing |
Type | Ph.D. Dissertation |
School | Department of Electrical Engineering, CCRMA, Stanford University |
Month | December |
Year | 1998 |
url | http://www-ccrma.stanford.edu/~scottl/thesis.html |
Note | http://www-ccrma.stanford.edu/~scottl/thesis.html |
Abstract | In the world of digital audio processing, one usually has the choice of performing modifications on the raw audio signal, or data compressing the audio signal. But, performing modifications on a data compressed audio signal has proved difficult in the past. This thesis provides a new representation of audio signals that allows for both very low bit rate audio data compression and high quality compressed domain processing and modifications. In this context, processing possibilities are time-scale and pitch-scale modifications. This new audio representation segments the audio into separate sinusoidal, transients, and noise signals. During determined attack transients regions, the audio is modeled by well established transform coding techniques. During the remaining non-transient regions of the input, the audio is modeled by a mixture of multiresolution sinusoidal modeling and noise modeling. Careful phase locking techniques at the time boundaries between the sines and transients allow for seamless transitions between representations. By separating the audio into three individual representations, each can be efficiently and perceptually quantized. |
MISC | plunderphonics [Osw99] |
Author | |
Title | Plunderphonics |
Howpublished | WWW page |
Year | 1999 |
url | http://www.interlog.com/~vacuvox/ |
Note | http://www.6q.com, esp. [Osw93] |
MISC | plexure [Osw93] |
Author | |
Title | Plexure |
Howpublished | CD |
Year | 1993 |
url | http://www.interlog.com/xdiscography.html#plexure |
Note | http://www.interlog.com/~vacuvox/xdiscography.html#plexure |
Abstract | Published by Disk Union Japan (on CD only), it should be in stores but is often hard to find or expensive. It is currently availabe from WFMU who also provide a short sample (193K).Plundered are over a thousand pop stars from the past 10 years. Rather than crediting each individual artist or group as he did in the original plunderphonic release, Oswald chose instead to reference morphed artists of his own creation (Bonnie Ratt, etc) It starts with rapmillisylables and progresses through the material according to tempo (which has an interesting relationship with genre). Oswald used several mechanisms to generate the plunderphonemes that make up this encyclopaedic popologue. This is the most formidable of the plunderphonics projects to date. |
MISC | thelongestandmostharmlessentry [vdVdlLvdV48] |
Author | |
Title | The Longest Bibliographic Reference |
Year | 1848 |
Remarks | This is here so that the longest bibliography reference is this one, [vdVdlLvdV48], and not something with an et. al. symbol, because this confuses tth, the tex to html translator, too much. |
MISC | berio91 [Ber91] |
Author | |
Title | Circles; Sequenza I, III, V |
Howpublished | Mediathèque CD00008601 |
Year | 1991 |
url | http://mediatheque.ircam.fr/cgi-bin/archives?AFFICHAGE=long\&ID=CD00008601 |
Note | Cathy Berberian (Stimme), Francis Pierre (Harfe), Jean-Pierre Drouet, Jean-Claude Casadesus (Schlagzeug), Aurèle Nicolet (Flöte), Vinko Globokar (Posaune) |
INPROC. | baudoin:eurospeech:97 [BCC97] |
Author | |
Title | Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate |
Booktitle | Proc. EUROSPEECH 97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 1295--1298 |
abstract-url | http://www.wcl2.ee.upatras.gr/eurtad.html#link1295 |
Abstract | This paper deals with the coding of spectral envelope parameters for very low bit rate speech coding (inferior to 500 bps). In order to obtain a sufficient intelligibility, segmental techniques are necessary. Variable dimension vector quantization is one of these. We propose a new interpretation of already published research from Chou-Lockabaugh [2] and Cernocky- Baudoin-Chollet [4,6] on the quantization of variable length sequences of spectral vectors, named respectively Variable to Variable length Vector Quantization (VVVQ) and Multigrams Quantization (MGQ). This interpretation gives a meaning to the Lagrange multiplier used in the optimization criterion of the VVVQ, and should allow new developments as, for example, new modelization of the probability density of the source. We have also studied the influence of the limitation of the delay introduced by the method. It was found that a maximal delay of 400 ms is generally sufficient. Finally, we propose the introduction of long sequences in the segmental codebook by linear interpolation of shorter ones. |
INPROC. | Stylianou_DecoOf_ICSLP96 [Sty96] |
Author | |
Title | Decomposition of Speech Signals into a Deterministic and a Stochastic Part |
Booktitle | Proc. ICSLP '96 |
Address | Philadelphia, PA |
Month | October |
Year | 1996 |
Volume | 2 |
Pages | 1213--1216 |
|
|
This document was translated from LATEX by HEVEA.