MISC | cstr:www [CSTR99] |
Key | CSTR |
Title | Centre for Speech Technology Research, University of Edinburgh |
Howpublished | WWW page |
Year | 1999 |
url | http://www.cstr.ed.ac.uk/ |
pub-url | http://www.cstr.ed.ac.uk/projects/festival/papers.html |
Note | http://www.cstr.ed.ac.uk/ |
INPROC. | cstr:unitsel96 [HB96] |
Author | |
Title | Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database |
Booktitle | Proc. ICASSP '96 |
Address | Atlanta, GA |
Month | May |
Year | 1996 |
Pages | 373--376 |
Note | www [CSTR99] Electronic version: cstr/Black1996a.s.* |
Remarks | cited in [MCW98] |
Abstract | One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two meth ods for training from speech are presented which provide weights which produce more natural speech than can be obtained by handtuning. |
INPROC. | cstr:unitsel97 [BT97b] |
Author | |
Title | Automatically Clustering Similar Units for Unit Selection in Speech Synthesis |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 601--604 |
Note | www [CSTR99] Electronic version: cstr/Black1997b.* |
Remarks | cited in [MCW98]: clustering and decision trees |
Abstract | This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis. |
INPROC. | cstr:eursp95 [BC95] |
Author | |
Title | Optimising selection of units from speech databases for concatenative synthesis |
Booktitle | Proc. Eurospeech '95 |
Volume | 1 |
Address | Madrid, Spain |
Month | September |
Year | 1995 |
Pages | 581--584 |
Remarks | Summary: Detailed description of unit selection model, used features and context, concatenation join point optimisation. Description of weight optimising procedure: euclidian cepstral distance (very limited first attempt) on real-speech test sentences. Unit selection as used in CHATR. cited in [MCW98] |
INPROC. | cstr:ssml97 [STTI97] |
Author | |
Title | A Markup Language for Text-To-Speech Synthesis |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 1747--1750 |
Note | www [CSTR99] Electronic version: cstr/Sproat1997a.* |
Abstract | Text-to-speech synthesizers must process text, and therefore require some knowledge of text structure. While many TTS systems allow for user control by means of ad hoc `escape sequences', there remains to date no adequate and generally agreed upon system-independent standard for marking up text for the purposes of synthesis. The present paper is a collaborative effort between two speech groups aimed at producing such a standard, in the form of an SGML-based markup language that we call STML --- Spoken Text Markup Language. The primary purpose of this paper is not to present STML as a fait accompli, but rather to interest other TTS research groups to collaborate and contribute to the development of this standard. |
TECHREP. | cstr:festival97 [BT97a] |
Author | |
Title | The Festival Speech Synthesis System: System Documentation (1.1.1) |
Institution | Human Communication Research Centre |
Type | Technical Report |
Number | HCRC/TR-83 |
Month | January |
Year | 1997 |
Pages | 154 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/manual-1.1.1/festival-1.1.1.ps.gz |
Remarks | new version [BTC98] |
TECHREP. | cstr:festival98 [BTC98] |
Author | |
Title | The Festival Speech Synthesis System: System Documentation (1.3.1) |
Institution | Human Communication Research Centre |
Type | Technical Report |
Number | HCRC/TR-83 |
Month | December |
Year | 1998 |
Pages | 202 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/manual-1.3.1/festival_toc.html |
Remarks | updated version of [BTC98], new utterance structure as in [Tay99], multiple synthesizers |
TECHREP. | cstr:festivalarch98 [Tay99] |
Author | |
Title | The Festival Speech Architecture |
Type | Web Page |
Year | 1999 |
Note | www [CSTR99] |
url | http://www.cstr.ed.ac.uk/projects/festival/arch.html |
Abstract | This is a short document describing the way we represent speech and linguistic structures in Festival. There are three main types of structure:
|
INPROC. | Campbell_FactAffe_EURO97 [CYDH97] |
Author | |
Title | Factors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser |
Booktitle | Proc. Eurospeech '97 |
Address | Rhodes, Greece |
Month | September |
Year | 1997 |
Pages | 2635--2638 |
Remarks | TO BE FOUND |
ARTICLE | Campbell_CHATR [Cam96] |
Author | |
Title | CHATR: A High-Definition Speech Re-Sequencing System |
Journal | Acoustical Society of America and Acoustical Society of Japan, Third Joint Meeting |
Address | Honolulu, HI |
Month | December |
Year | 1996 |
Remarks | TO BE FOUND |