Previous Contents Next

5   CSTR Centre for Speech Technology Research

MISCcstr:www [CSTR99]
KeyCSTR
TitleCentre for Speech Technology Research, University of Edinburgh
HowpublishedWWW page
Year1999
urlhttp://www.cstr.ed.ac.uk/
pub-urlhttp://www.cstr.ed.ac.uk/projects/festival/papers.html
Notehttp://www.cstr.ed.ac.uk/


INPROC.cstr:unitsel96 [HB96]
Author
A. J. Hunt, A. W. Black
TitleUnit Selection in a Concatenative Speech Synthesis System using a Large Speech Database
BooktitleProc. ICASSP '96
AddressAtlanta, GA
MonthMay
Year1996
Pages373--376
Notewww [CSTR99] Electronic version: cstr/Black1996a.s.*
Remarkscited in [MCW98]
AbstractOne approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two meth ods for training from speech are presented which provide weights which produce more natural speech than can be obtained by handtuning.


INPROC.cstr:unitsel97 [BT97b]
Author
Alan W Black, Paul Taylor
TitleAutomatically Clustering Similar Units for Unit Selection in Speech Synthesis
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages601--604
Notewww [CSTR99] Electronic version: cstr/Black1997b.*
Remarkscited in [MCW98]: clustering and decision trees
AbstractThis paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis.


INPROC.cstr:eursp95 [BC95]
Author
A. W. Black, N. Campbell
TitleOptimising selection of units from speech databases for concatenative synthesis
BooktitleProc. Eurospeech '95
Volume1
AddressMadrid, Spain
MonthSeptember
Year1995
Pages581--584
RemarksSummary: Detailed description of unit selection model, used features and context, concatenation join point optimisation. Description of weight optimising procedure: euclidian cepstral distance (very limited first attempt) on real-speech test sentences. Unit selection as used in CHATR. cited in [MCW98]


INPROC.cstr:ssml97 [STTI97]
Author
Richard Sproat, Paul Taylor, Michael Tanenblatt, Amy Isard
TitleA Markup Language for Text-To-Speech Synthesis
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages1747--1750
Notewww [CSTR99] Electronic version: cstr/Sproat1997a.*
AbstractText-to-speech synthesizers must process text, and therefore require some knowledge of text structure. While many TTS systems allow for user control by means of ad hoc `escape sequences', there remains to date no adequate and generally agreed upon system-independent standard for marking up text for the purposes of synthesis. The present paper is a collaborative effort between two speech groups aimed at producing such a standard, in the form of an SGML-based markup language that we call STML --- Spoken Text Markup Language. The primary purpose of this paper is not to present STML as a fait accompli, but rather to interest other TTS research groups to collaborate and contribute to the development of this standard.


TECHREP.cstr:festival97 [BT97a]
Author
Alan Black, Paul Taylor
TitleThe Festival Speech Synthesis System: System Documentation (1.1.1)
InstitutionHuman Communication Research Centre
TypeTechnical Report
NumberHCRC/TR-83
MonthJanuary
Year1997
Pages154
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/manual-1.1.1/festival-1.1.1.ps.gz
Remarksnew version [BTC98]


TECHREP.cstr:festival98 [BTC98]
Author
Alan Black, Paul Taylor, Richard Caley
TitleThe Festival Speech Synthesis System: System Documentation (1.3.1)
InstitutionHuman Communication Research Centre
TypeTechnical Report
NumberHCRC/TR-83
MonthDecember
Year1998
Pages202
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/manual-1.3.1/festival_toc.html
Remarksupdated version of [BTC98], new utterance structure as in [Tay99], multiple synthesizers


TECHREP.cstr:festivalarch98 [Tay99]
Author
Paul Taylor
TitleThe Festival Speech Architecture
TypeWeb Page
Year1999
Notewww [CSTR99]
urlhttp://www.cstr.ed.ac.uk/projects/festival/arch.html
AbstractThis is a short document describing the way we represent speech and linguistic structures in Festival. There are three main types of structure:
Items
An item is a single linguistic unit, such as a phone, word, syllable, syntactic node, intonation phrase etc. Each item has a set of features which describe its local properties. For instance a word could have features, , , ... Values of features can be real values or functions.
Relations
A relation links together items of a common linguistic type. For instance there we might have a word, phone, syntax or syllable relation. Relations are general graph structures, the most common type being a simple doubly linked list. Eg. the word relation is a doubly linked list that links all the words in an utterance in the order they occur in. Relations can also take the form of trees. For example, we have a syllable structure relation which gives onset, coda, nucleus and rhyme structure for a syllable. The crucial aspect of the Festival architecture is that items can be in more than one relation. For example, a syntax relation is a tree whose terminal elements are words, which are also in the word relation.
Utterances
Utterances contain a list of all the relations.


INPROC.Campbell_FactAffe_EURO97 [CYDH97]
Author
Nick Campbell, Itoh Yoshiharu, Wen Ding, Norio Higuchi
TitleFactors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser
BooktitleProc. Eurospeech '97
AddressRhodes, Greece
MonthSeptember
Year1997
Pages2635--2638
RemarksTO BE FOUND


ARTICLECampbell_CHATR [Cam96]
Author
N. Campbell
TitleCHATR: A High-Definition Speech Re-Sequencing System
JournalAcoustical Society of America and Acoustical Society of Japan, Third Joint Meeting
AddressHonolulu, HI
MonthDecember
Year1996
RemarksTO BE FOUND



Previous Contents Next