CSTR Centre for Speech Technology Research

Previous

Contents

Next

5 CSTR Centre for Speech Technology Research

MISC	cstr:www [CSTR99]
Key	CSTR
Title	Centre for Speech Technology Research, University of Edinburgh
Howpublished	WWW page
Year	1999
url	`http://www.cstr.ed.ac.uk/`
pub-url	`http://www.cstr.ed.ac.uk/projects/festival/papers.html`
Note	`http://www.cstr.ed.ac.uk/`

INPROC.	cstr:unitsel96 [HB96]
Author	A. J. Hunt, A. W. Black
Title	Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database
Booktitle	Proc. ICASSP '96
Address	Atlanta, GA
Month	May
Year	1996
Pages	373--376
Note	www [CSTR99] Electronic version: cstr/Black1996a.s.*
Remarks	cited in [MCW98]
Abstract	One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two meth ods for training from speech are presented which provide weights which produce more natural speech than can be obtained by handtuning.

INPROC.	cstr:unitsel97 [BT97b]
Author	Alan W Black, Paul Taylor
Title	Automatically Clustering Similar Units for Unit Selection in Speech Synthesis
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	601--604
Note	www [CSTR99] Electronic version: cstr/Black1997b.*
Remarks	cited in [MCW98]: clustering and decision trees
Abstract	This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis.

INPROC.	cstr:eursp95 [BC95]
Author	A. W. Black, N. Campbell
Title	Optimising selection of units from speech databases for concatenative synthesis
Booktitle	Proc. Eurospeech '95
Volume	1
Address	Madrid, Spain
Month	September
Year	1995
Pages	581--584
Remarks	Summary: Detailed description of unit selection model, used features and context, concatenation join point optimisation. Description of weight optimising procedure: euclidian cepstral distance (very limited first attempt) on real-speech test sentences. Unit selection as used in CHATR. cited in [MCW98]

INPROC.	cstr:ssml97 [STTI97]
Author	Richard Sproat, Paul Taylor, Michael Tanenblatt, Amy Isard
Title	A Markup Language for Text-To-Speech Synthesis
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	1747--1750
Note	www [CSTR99] Electronic version: cstr/Sproat1997a.*
Abstract	Text-to-speech synthesizers must process text, and therefore require some knowledge of text structure. While many TTS systems allow for user control by means of ad hoc `escape sequences', there remains to date no adequate and generally agreed upon system-independent standard for marking up text for the purposes of synthesis. The present paper is a collaborative effort between two speech groups aimed at producing such a standard, in the form of an SGML-based markup language that we call STML --- Spoken Text Markup Language. The primary purpose of this paper is not to present STML as a fait accompli, but rather to interest other TTS research groups to collaborate and contribute to the development of this standard.

TECHREP.	cstr:festival97 [BT97a]
Author	Alan Black, Paul Taylor
Title	The Festival Speech Synthesis System: System Documentation (1.1.1)
Institution	Human Communication Research Centre
Type	Technical Report
Number	HCRC/TR-83
Month	January
Year	1997
Pages	154
Note	www [CSTR99]
url	`http://www.cstr.ed.ac.uk/projects/festival/manual-1.1.1/festival-1.1.1.ps.gz`
Remarks	new version [BTC98]

TECHREP.	cstr:festival98 [BTC98]
Author	Alan Black, Paul Taylor, Richard Caley
Title	The Festival Speech Synthesis System: System Documentation (1.3.1)
Institution	Human Communication Research Centre
Type	Technical Report
Number	HCRC/TR-83
Month	December
Year	1998
Pages	202
Note	www [CSTR99]
url	`http://www.cstr.ed.ac.uk/projects/festival/manual-1.3.1/festival_toc.html`
Remarks	updated version of [BTC98], new utterance structure as in [Tay99], multiple synthesizers

TECHREP.	cstr:festivalarch98 [Tay99]
Author	Paul Taylor
Title	The Festival Speech Architecture
Type	Web Page
Year	1999
Note	www [CSTR99]
url	`http://www.cstr.ed.ac.uk/projects/festival/arch.html`
Abstract	This is a short document describing the way we represent speech and linguistic structures in Festival. There are three main types of structure: Items An item is a single linguistic unit, such as a phone, word, syllable, syntactic node, intonation phrase etc. Each item has a set of features which describe its local properties. For instance a word could have features, , , ... Values of features can be real values or functions. Relations A relation links together items of a common linguistic type. For instance there we might have a word, phone, syntax or syllable relation. Relations are general graph structures, the most common type being a simple doubly linked list. Eg. the word relation is a doubly linked list that links all the words in an utterance in the order they occur in. Relations can also take the form of trees. For example, we have a syllable structure relation which gives onset, coda, nucleus and rhyme structure for a syllable. The crucial aspect of the Festival architecture is that items can be in more than one relation. For example, a syntax relation is a tree whose terminal elements are words, which are also in the word relation. Utterances Utterances contain a list of all the relations.

INPROC.	Campbell_FactAffe_EURO97 [CYDH97]
Author	Nick Campbell, Itoh Yoshiharu, Wen Ding, Norio Higuchi
Title	Factors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	2635--2638
Remarks	TO BE FOUND

ARTICLE	Campbell_CHATR [Cam96]
Author	N. Campbell
Title	CHATR: A High-Definition Speech Re-Sequencing System
Journal	Acoustical Society of America and Acoustical Society of Japan, Third Joint Meeting
Address	Honolulu, HI
Month	December
Year	1996
Remarks	TO BE FOUND

Previous

Contents

Next