TCTS Circuit Theory and Signal Processing Lab

Previous

Contents

Next

13 TCTS Circuit Theory and Signal Processing Lab

MISC	tcts:www [TCTS99]
Key	TCTS
Title	TCTS (Circuit Theory and Signal Processing) Lab, Faculté Polytechnique de Mons
Howpublished	WWW page
Year	1999
url	`http://tcts.fpms.ac.be`
group-url	`http://tcts.fpms.ac.be/synthesis/synthesis.html`
pub-url	`http://tcts.fpms.ac.be/publications.html`
Note	`http://tcts.fpms.ac.be`

INPROC.	tcts:euspico98 [DMD98]
Author	O. Deroo, F. Malfrere, T. Dutoit
Title	Comparaison of two different alignment systems: speech synthesis vs. Hybrid HMM/ANN
Booktitle	Proc. European Conference on Signal Processing (EUSIPCO'98)
Address	Greece
Year	1998
Pages	1161--1164
Note	www [TCTS99], same content as [MDD98] (but less references)
url	`http://tcts.fpms.ac.be/publications/papers/1998/eusipco98_odfmtd.zip`
Abstract	In this paper we compared two different methods for phonetically labeling a French database. The first one is based on the temporal alignment of the speech signal on a high quality synthetic speech pattern and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a single speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outline the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage (hence no labeled database) is needed, while the classical HMM/ANN system allows easily multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic constitution of large phonetically and prosodically labeled speech databases based on using the synthetic speech segmentation tool in order to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems. All the experiments reported in this article related to the hybrid HMM/ANN system have been realized with the STRUT [3] software.

INPROC.	tcts:tsd98 [DMP+98]
Title	EULER: Multi-Lingual Text-to-Speech Project
Pages	27--32
Author	T. Dutoit, F. Malfrère, V. Pagel, M. Bagein P. Mertens, A. Ruelle, A. Gilman
Booktitle	Proceedings of the First Workshop on Text, Speech, Dialogue --- TSD'98
Year	1998
Editor	Petr Sojka, Václav Matousek, Karel Pala, Ivan Kopecek
Address	Brno, Czech Republic
Month	September
Publisher	Masaryk University Press
Note	www [TCTS99]Electronic version: tcts/tsd98tdfmvppmmbarag.ps.*
Remarks	modularity
Abstract	Text-to-speech systems requires simultaneously an abstract linguistic analysis, an acoustic linguistic analysis and a final digital processing stage. The aim of the project presented in this paper is to obtain a set of text-to-speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. This project is an extension of the MBROLA projects. MBROLA is a speech synthesizer that is freely distributed for non-commercial purposes. A multi-lingual speech segmentation and prosody transplantation tool called MBROLIGN has also been developed and freely distributed. Other labs have also recently distributed for free important tools for speech synthesis like Festival from University o f Edinburgh or the MULTEXT project of the University de Provence. The purpose of this paper is to present the EULER project, which will try to integrate all these results, to Eastern European potential partners, so as to increase the dissemination of the important results of MBROLA and MBROLIGN projects and stimulate East/West collaboration on TTS synthesis.

INPROC.	tcts:icslp98-fmodtd [MDD98]
Author	F. Malfrere, O. Deroo, T. Dutoit
Title	Phonetic Alignement : Speech Synthesis Based Vs. Hybrid HMM/ANN
Booktitle	Proc. International Conference on Speech and Language Processing
Address	Sidney, Australia
Year	1998
Pages	1571--1574
Note	www [TCTS99], same content as [DMD98] (with more references)
url	`http://tcts.fpms.ac.be/publications/papers/1998/icslp98_fmodtd.zip`
Abstract	In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.

INPROC.	tcts:iscas97 [MD97a]
Author	F. Malfrere, T. Dutoit
Title	Speech Synthesis for Text-To-Speech Alignment and Prosodic Feature Extraction
Booktitle	Proc. ISCAS 97
Address	Hong-Kong
Year	1997
Pages	2637--2640
Note	www [TCTS99]
url	`http://tcts.fpms.ac.be/publications/papers/1997/iscas97_fmtd.zip`
Remarks	Recent developments in prosody generation have highlighted the potential interest of machine learning techniques such as multilayer perceptrons [Tra92], linear regression techniques [SK92], classification and regression trees [Hir91], or statistical techniques [MPH93], based on the automatic analysis of large prosodically labeled corpora. Only the segmental features of the reference signal used in alignment. Assumption: the segmental and suprasegmental features are approximately uncorrelated. Keep only the perceptually relevant F0 cues, perceptual stylization, based on a model of tonal perception [alessandro95]. Robust cepstrum by sinusoidal weighting [GL88]. Derivative of cepstrum [SR88].
Abstract	The aim of this paper is to present a new and promising approach of the text--to--speech alignment problem. For this purpose, an original idea is developed : a high quality digital speech synthesizer is used to create a reference speech pattern used during the alignment process. The system has been used and tested to extract the prosodic features of read French utterances. The results show a segmentation error rate of about 8%. This system will be a powerful tool for the automatic creation of large prosodically labeled databases and for research on automatic prosody generation.

INPROC.	tcts:eurosp97 [SDS97]
Author	Yannis Stylianou, Thierry Dutoit, Juergen Schroeter
Title	Diphone Concatenation Using a Harmonic Plus Noise Model of Speech
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	613--616
Note	www [TCTS99]Electronic version: tcts/hnmconc.ps.*
Remarks	Important! HNM (Marine) basis paper, pitch synchronous. Diphone smoothing in region of quasi-stationarity. Additive better for concatenation than PSOLA. References: [DG96] (non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation), [SLM95] (phase treatment, modifications), [Mac96] (non pitch synchronous harmonic modeling).
Abstract	In this paper we present a high-quality text-to-speech system using diphones. The system is based on a Harmonic plus Noise (HNM) representation of the speech signal. HNM is a pitch-synchronous analysis-synthesis system but does not require pitch marks to be determined as necessary in PSOLA-based methods. HNM assumes the speech signal to be composed of a periodic part and a stochastic part. As a result, different prosody and spectral envelope modification methods can be applied to each part, yielding more natural-sounding synthetic speech. The fully parametric representation of speech using HNM also provides a straightforward way of smoothing diphone boundaries. Informal listening tests, using natural prosody, have shown that the synthetic speech quality is close to the quality of the original sentences, without smoothing problems and without buzziness or other oddities observed with other speech representations used for TTS.

INPROC.	tcts:speechcomm96 [DG96]
Author	T. Dutoit, B. Gosselin
Title	On the use of a hybrid harmonic/stochastic model for tts synthesis by concatenation
Booktitle	Speech Communication
Number	19
Pages	119--143
Year	1996
Remarks	Cited in [SDS97] for non pitch-synchronous hybrid harmonic/stochastic synthesis, real-time generation of signals from spectral representation. TO BE FOUND

INPROC.	macon-thesis96 [Mac96]
Author	Michael W. Macon
Title	Speech Synthesis Based on Sinusoidal Modeling
Booktitle	PhD thesis
Publisher	Georgia Institute of Technology
Month	October
Year	1996
Remarks	Cited in [SDS97] for non pitch synchronous harmonic modeling. TO BE FOUND

INPROC.	stylianou:eurospeech95 [SLM95]
Author	Y. Stylianou, J. Laroche, E. Moulines
Title	High Quality Speech Modification based on a Harmonic+Noise Model
Booktitle	Proc. EUROSPEECH
Year	1995
Remarks	Cited in [SDS97] for phase treatment, modifications, maximum voice frequency. TO BE FOUND

INPROC.	Malfrere_HighQual_EURO97 [MD97b]
Author	Fabrice Malfrere, Thierry Dutoit
Title	High Quality Speech Synthesis for Phonetic Speech Segmentation
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	2631--2634

INPROC.	Olivier_SimpAnd_EURO97 [vdVOPD+97]
Author	van der Vrecken Olivier, Nicolas Pierret, Thierry Dutoit, Vincent Pagel, Fabrice Malfrere
Title	A Simple and Efficient Algorithm for the Compression of MBROLA Segment Databases
Booktitle	Proc. Eurospeech '97
Address	Rhodes, Greece
Month	September
Year	1997
Pages	421--424

INPROC.	Dutoit_TheMbro_ICSLP96 [DPP+96]
Author	T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. V. der Vrecken
Title	The MBROLA project: Towards a Set of High Quality Speech Synthesizers Free of Use for Non Commercial Purposes
Booktitle	Proc. ICSLP '96
Address	Philadelphia, PA
Month	October
Year	1996
Volume	3
Pages	1393--1396

INPROC.	Dutoit_HighQual_ICASSP94 [Dut94]
Author	T. Dutoit
Title	High Quality Text-to-Speech Synthesis: a Comparison of four Candidate Algorithms
Booktitle	Proc. ICASSP '94
Address	Adelaide, Austrailia
Month	April
Year	1994
Pages	I--565--I--568

Previous

Contents

Next