Main.Presentation History
Hide minor edits - Show changes to markup
(:cell bgcolor=#cccc89 align=center:) OB
(:cell bgcolor=#cccc89 align=center:) BO
- Rych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations).
- Rich linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations).
- Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling.
- This work presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling.
- Rych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed linguistic phenomenon.
- Rych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations).
- Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvment depends of the observed linguistic phenomenon.
- This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN).
- This work introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN) proposed by G. Degottex.
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- title "Triplet Markov chains and Unsupervised signal segmentation"
- director: Wojciech Pieczynski.
- with honors (mention très honorable)
- keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- Title "Triplet Markov chains and Unsupervised signal segmentation"
- Director: Wojciech Pieczynski.
- With honors (mention très honorable)
- Keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- ajor: signal processing and decision theory.
- Major: signal processing and decision theory.
DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music\\
DEA ATIAM: Master Degree in Acoustics, Signal Processing and Computer science applied to Music\\
- Keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
- Keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, major in statistical image processing during the last year.
- Research and development in a HMM-based speech synthesis system for French based on HTS including a new excitation model (SVLN) proposed by G. Degottex
- Research and development of a HMM-based speech synthesis system for French based on HTS including a new excitation model (SVLN) proposed by G. Degottex
- Research and development of a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. *Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- Research and development of a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different levels (phone, word, phrase, paragraph) and to align it on the speech audio signal.
- Multiple pronunciation are taking into account during the alignment using a constrained phonetic graph build from the text.
- Teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
- Teaching: C language, UNIX, numerical analysis, multimedia (coding), system and network, final projects supervisation.
- Teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
- Teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisation.
- Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
- Analytical and numerical study of the temporal response of a circular plate involving a set of internal resonances in the context of non linear vibration.
- Title "Triplet Markov chains and Unsupervised signal segmentation"
- Director: Wojciech Pieczynski.
- With honors (mention très honorable)
- Keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- title "Triplet Markov chains and Unsupervised signal segmentation"
- director: Wojciech Pieczynski.
- with honors (mention très honorable)
- keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- Major: signal processing and decision theory.
- ajor: signal processing and decision theory.
- The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
- The aim of VC in this project is to convert the voice of a commercial TTS to the voice of the user using few sentences.
- Research on Voice Conversion (Dynamic Model Selection) and development of a system based on GMM modeling of the joint law of source and target acoustic features.
- Research on Voice Conversion (Dynamic Model Selection) and development of a system based on GMM modeling of the joint density of source and target acoustic features.
- Research on Voice conversion (reduction of the conditional variance) and development of a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- Research on Voice conversion (reduction of the conditional variance) and development of a Voice conversion system based on GMM modeling of the joint density of source and target acoustic features.
A major drawback of current Hidden Markov Model-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling. A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text. Sych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed llinguisitc phenomenon.
- A major drawback of current Hidden Markov Model-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody.
- Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling.
- A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text.
- Rych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed linguistic phenomenon.
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
- This work presents an approach for modeling speaking style of various discourse genres in speech synthesis.
- The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models.
- The phonological module models the average abstract prosodic structure of a specific discourse genre.
- The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre.
- Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment.
- A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods (with G. Degottex)
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features (with N. Obin)
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis (with N. Obin)
|| border=0
|| border=0
example 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/lepetitpoucet.2.hts.morpho.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/lepetitpoucet.2.1order.morpho.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/lepetitpoucet.2.pg.morpho.mp3 width=60 height=18:) |
---|---|---|---|
example 2 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/PROUST_DUSSOLIER_110014.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/PROUST_DUSSOLIER_110036.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/PROUST_DUSSOLIER_110058.mp3 width=60 height=18:) |
HTS | Reference |
---|
Prosodical stereotype | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.302.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.738.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.1853.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.2054.norm.mp3 width=200 height=18:) |
---|---|---|---|---|
HTS | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_JOURNAL.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_MESSE.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_POLITIQUE.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_SPORT.norm.mp3 width=200 height=18:) |
Prosodical stereotype | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.302.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.738.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.1853.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.2054.norm.mp3 width=60 height=18:) |
---|---|---|---|---|
HTS | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_JOURNAL.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_MESSE.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_POLITIQUE.norm.mp3 width=60 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_SPORT.norm.mp3 width=60 height=18:) |
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=50 height=18:) |
---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=60 height=18:) |
---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=80 height=18:) |
---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=50 height=18:) |
---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=200 height=18:) |
---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=80 height=18:) |
---|
HTS | Reference | |||
---|---|---|---|---|
Samples | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=200 height=18:) | |||
Prosodical stereotype | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.302.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.738.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.1853.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS.2054.norm.mp3 width=200 height=18:) |
HTS | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_JOURNAL.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_MESSE.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_POLITIQUE.norm.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/DISCOURS_SPORT.norm.mp3 width=200 height=18:) |
- ... and with more data
HTS | Reference | |
---|---|---|
André | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/lepetitpoucet.3.1order.morpho.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/Proust_Dussolier.1.mp3 width=200 height=18:) |
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/extract_all_rev.mp3 width=200 height=18:)
Hypermusic | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/extract_all_rev.mp3 width=200 height=18:) |
---|
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/extract_all_rev.mp3 width=200 height=18:)
some examples musicalProduction
- some examples here
Musical productions using ircamAlign
Musical productions using ircamAlign
some examples musicalProduction
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,
P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
LREC'08 Proceedings, Marrakech, Marocco, 2008.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/ircamAlign6b.jpg
Musical productions using ircamAlign
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,
P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
LREC'08 Proceedings, Marrakech, Marocco, 2008.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/ircamAlign6b.jpg
- Com que voz, Stefano Gervasoni, Thomas Goepfer
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- Häxan, la sorcellerie à travers les âges, Mauro Lanza, Olivier Pasquet
- Cantate égale pays, Gérard Pesson, Sébastien Roux
- Le père, Michael Jarrel, Serge Lemouton
- Com que voz, Stefano Gervasoni, Thomas Goepfer
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- Häxan, la sorcellerie à travers les âges, Mauro Lanza, Olivier Pasquet
- Cantate égale pays, Gérard Pesson, Sébastien Roux
- Le père, Michael Jarrel, Serge Lemouton
- ircamAlign is used by composers and it has been used in several musical creations at IRCAM
- ircamAlign is used by composers and it has been used in several musical creations at IRCAM such as:
- Com que voz, Stefano Gervasoni, Thomas Goepfer
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- Häxan, la sorcellerie à travers les âges, Mauro Lanza, Olivier Pasquet
- Cantate égale pays, Gérard Pesson, Sébastien Roux
- Le père, Michael Jarrel, Serge Lemouton
- ircamAlign is used by composer at has been used in several musical creations at IRCAM
- ircamAlign is used by composers and it has been used in several musical creations at IRCAM
- ircamAlign is used by composer at has been used in several musical creations at IRCAM
- ircamAlign is used by composer at has been used in several musical creations at IRCAM
- ircamAlign is used by composer at has been used in several musical creations at IRCAM
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,\\
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,\\
- A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method,\\
- A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method,\\
- Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features,\\
- Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features,\\
- Dynamic Model Selection for Spectral Voice Conversion,\\
- Dynamic Model Selection for Spectral Voice Conversion,\\
- it is based on the HTK toolbox
- it is based on the HTK toolbox and LLIAPHON french phonetizer
- available for French and English
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
- if the text transcription is no available, a bi-gram language model is used
- ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
- based on the HTK toolbox
- ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
- it is based on the HTK toolbox
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph. Phoneme are modelized by left-right HMM with 7 states.
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
- phoneme are modelized by left-right HMM with 7 states.
- Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording.
- Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy.
- This paper presents the results of tuning such a phone segmentation system.
- Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model.
- Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries.
- Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries.
- ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
- based on the HTK toolbox
- audio speech file and its textual transcription are taken as input
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph. Phoneme are modelized by left-right HMM with 7 states.
- Confidence measure are computed at different linguistic level for easier manual correction
- HTS lab features format are directly created to allow the quick creation of new voices.
- This paper presents the results of tuning suc a phome segmentation system.
- This paper presents the results of tuning such a phone segmentation system.
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
- Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording.
- Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy.
- This paper presents the results of tuning suc a phome segmentation system.
- Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model.
- Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries.
- Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries.
- I am working on a voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation.
- Research on Voice Conversion (Dynamic Model Selection) and development of a system based on GMM modeling of the joint law of source and target acoustic features.
- Implementation of a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation.
- Research and development of a HMM-based speech synthesis system for French based on HTS with high level syntactical features with N. Obin
- I developed a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model proposed by G. Degottex
- Research on Voice conversion (reduction of the conditional variance) and development of a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- Research and development in a HMM-based speech synthesis system for French based on HTS including a new excitation model (SVLN) proposed by G. Degottex
- I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- Research and development of a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. *Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
- Teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
- teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
- Teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
- Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis,
N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet,
ICASSP 2011, Prague, Czech Republic, May 2011, Submitted
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
- Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis,
N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet,
ICASSP 2011, Prague, Czech Republic, May 2011, Submitted
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
- Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis,
N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet,
ICASSP 2011, Prague, Czech Republic, May 2011, Submitted
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
A major drawback of current Hidden Markov Model-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling. A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text. Sych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed llinguisitc phenomenon.
- Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features,
N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet,
Speech Prosody 2010 Proceedings, Chicago, USA, 2010.
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
- Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis,
N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet,
ICASSP 2011, Prague, Czech Republic, May 2011, Submitted
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
A major drawback of current Hidden Markov Model-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling. A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text. Sych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed llinguisitc phenomenon.
- Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features,
N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet,
Speech Prosody 2010 Proceedings, Chicago, USA, 2010.
Transformation examples
Transformation examples
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
- Statistical methods for voice conversion are usually based on a single model selected in order to represent a tradeoff between goodness of fit and complexity.
- In this work we assumed that the best model may change over time, depending on the source acoustic features.
- We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal.
- This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame.
- Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
Transformation examples
- SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties.
F0 scale | VTF scale | Rd scale | Audio (HTS) | |
---|---|---|---|---|
1 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp01.mp3 width=200 height=18:) | Original voice |
0.6 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp02.mp3 width=200 height=18:) | |
0.6 | 0.85 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp03.mp3 width=200 height=18:) | |
0.6 | 0.85 | 0.5 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp04.mp3 width=200 height=18:) | Baryton voice |
2.5 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp05.mp3 width=200 height=18:) | |
2.5 | 1.7 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp06.mp3 width=200 height=18:) | |
2.5 | 1.7 | 3 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp07.mp3 width=200 height=18:) | Little girl voice |
Voice Conversion
Dynamic Model Selection for spectral Voice Conversion
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
- Dynamic Model Selection for Spectral Voice Conversion,
P. Lanchantin and X. Rodet,
''Interspeech 2010 Proceedings, Makuhari, Japan, Sept 2010.
VC from real source voice
- Target Reference Samples
(:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles (:cellnr bgcolor=#cccc99 align=center:) 004 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Fernando.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Tremblay.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Gilles.4.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) Pair (:cell bgcolor=#cccc89 align=center:) Pulse (:cell bgcolor=#cccc89 align=center:) STRAIGHT (:cell bgcolor=#cccc89 align=center:) SVLN (:cellnr bgcolor=#cccc99 align=center:) 2 (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.2.bu.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.2.st.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.2.lf.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 4 (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.4.bu.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.4.st.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.4.lf.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 6 (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.6.bu.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.6.st.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.6.lf.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 8 (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.8.bu.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.8.st.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.8.lf.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 11 (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.11.bu.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.11.st.mp3 width=62 height=18:) (:cell align=center:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/test_mp3/Xavier2007.11.lf.mp3 width=62 height=18:)
- Converted source envelope
Transformation examples
- SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties.
F0 scale | VTF scale | Rd scale | Audio (HTS) | |
---|---|---|---|---|
1 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp01.mp3 width=200 height=18:) | Original voice |
0.6 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp02.mp3 width=200 height=18:) | |
0.6 | 0.85 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp03.mp3 width=200 height=18:) | |
0.6 | 0.85 | 0.5 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp04.mp3 width=200 height=18:) | Baryton voice |
2.5 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp05.mp3 width=200 height=18:) | |
2.5 | 1.7 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp06.mp3 width=200 height=18:) | |
2.5 | 1.7 | 3 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp07.mp3 width=200 height=18:) | Little girl voice |
Voice Conversion
Dynamic Model Selection for spectral Voice Conversion
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
- Dynamic Model Selection for Spectral Voice Conversion,
P. Lanchantin and X. Rodet,
''Interspeech 2010 Proceedings, Makuhari, Japan, Sept 2010.
VC from real source voice
- Target Reference Samples
(:cell bgcolor=#cccd01 align=center:) Xavier
(:cellnr bgcolor=#cccc99 align=center:) 004 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Fernando.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Tremblay.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Gilles.4.mp3 width=62 height=18:) (:tableend:)
- Converted source envelope
(:table border=1 cellpadding=2 cellspacing=0 align=center:) (:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccd01 align=center:) Xavier (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles
- This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). *The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise.
- This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN).
- The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise.
This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. the synthesis results are finally discussed and subjectively evaluated.
The proposed subjective test can be find here
- This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). *The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise.
- This glottal source is first estimated and then used in the vocal-tract estimation procedure.
- Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems.
- The synthesis results were subjectively evaluated here
The proposed subjective test can be find here
(:table border=0 cellpadding=2 cellspacing=0 align=center:)
(:table border=1 cellpadding=2 cellspacing=0 align=center:)
(:table border=1 cellpadding=2 cellspacing=0 align=center:)
(:table border=0 cellpadding=2 cellspacing=0 align=center:)
This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. the synthesis results are finally discussed and subjectively evaluated.
This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. the synthesis results are finally discussed and subjectively evaluated.
- SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties.
F0 scale | VTF scale | Rd scale | Audio (HTS) |
---|
F0 scale | VTF scale | Rd scale | Audio (HTS) |
---|
F0 scale | VTF scale | Rd scale | Audio |
---|
F0 scale | VTF scale | Rd scale | Audio (HTS) |
---|
Transformation examples
F0 scale | VTF scale | Rd scale | Audio | |
---|---|---|---|---|
1 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp01.mp3 width=200 height=18:) | Original voice |
0.6 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp02.mp3 width=200 height=18:) | |
0.6 | 0.85 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp03.mp3 width=200 height=18:) | |
0.6 | 0.85 | 0.5 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp04.mp3 width=200 height=18:) | Baryton voice |
2.5 | 1 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp05.mp3 width=200 height=18:) | |
2.5 | 1.7 | 1 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp06.mp3 width=200 height=18:) | |
2.5 | 1.7 | 3 | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/icassp07.mp3 width=200 height=18:) | Little girl voice |
(:cellnr bgcolor=#cccc99 align=center:) 002 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.002.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.002.new.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) 004 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.004.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.004.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 005 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.005.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.005.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 006 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.006.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.006.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 007 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.007.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.007.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 008 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.008.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.008.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 009 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.009.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.009.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 010 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.010.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.010.new.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) 038 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.038.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.038.new.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) 088 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.088.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.088.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 142 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.142.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.142.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 162 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.162.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.162.new.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) 184 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.184.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.184.new.mp3 width=62 height=18:)
(:cellnr bgcolor=#cccc99 align=center:) 014 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.014.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 015 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.015.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 016 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.016.mp3 width=62 height=18:)
- Converted source envelope (with old and new method)
- Converted source envelope
(:cell bgcolor=#cccc89 align=center:) OLD (:cell bgcolor=#cccc89 align=center:) NEW
(:cell bgcolor=#cccc89 align=center:) OB
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.001.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.002.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.003.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.004.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.005.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.006.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.007.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.008.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.009.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.010.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.011.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.038.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.049.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.071.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.078.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.082.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.088.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.142.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.162.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.171.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.184.mp3 width=62 height=18:)
(:cell bgcolor=#cccc89 align=center:) Xavier
(:cell bgcolor=#cccd01 align=center:) Xavier
(:cell bgcolor=#cccc89 align=center:) Xavier
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Xavier.4.mp3 width=62 height=18:)
- with Converted source envelope
- Converted source envelope
- Reference: with Aligned target envelope
VC from a commercial TTS source voice
- Target Reference Samples
(:cell bgcolor=#cccc89 align=center:) Xavier (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles
(:cell bgcolor=#cccc89 align=center:) BO (:cellnr bgcolor=#cccc99 align=center:) 012 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.012.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 013 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.013.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 014 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.014.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 015 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.015.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 016 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.016.mp3 width=62 height=18:) (:tableend:)
- Converted source envelope (with old and new method)
(:table border=1 cellpadding=2 cellspacing=0 align=center:) (:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccd01 align=center:) Ryan (:cell bgcolor=#cccc89 align=center:) OLD (:cell bgcolor=#cccc89 align=center:) NEW
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Xavier.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.1.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.001.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.001.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.001.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 002 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.002.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.002.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.002.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 003 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.003.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.003.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.003.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 004 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.004.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.004.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.004.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 005 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.005.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.005.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.005.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 006 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.006.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.006.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.006.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 007 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.007.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.007.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.007.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 008 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.008.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.008.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.008.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 009 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.009.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.009.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.009.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 010 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.010.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.010.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.010.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 011 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.011.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.011.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.011.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 038 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.038.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.038.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.038.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 049 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.049.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.049.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.049.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 071 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.071.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.071.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.071.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 078 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.078.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.078.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.078.new.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.82.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.082.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.082.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.082.new.mp3 width=62 height=18:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.88.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 099 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.99.mp3 width=62 height=18:) (:tableend:)
(:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.088.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.088.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.088.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 142 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.142.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.142.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.142.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 162 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.162.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.162.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.162.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 171 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.171.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.171.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.171.new.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 184 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan.184.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Ryan-to-Barack.184.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XTRA/Barack.184.new.mp3 width=62 height=18:) (:tableend:)
Target Reference Samples
VC from real source voice
- Target Reference Samples
Voice Conversion:
- Overfitting
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM256dem.mp3 width=62 height=18:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM256-fulldem.mp3 width=62 height=18:)
- with Aligned target envelope
- Reference: with Aligned target envelope
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=200 height=18:)
Target Reference Samples
(:table border=1 cellpadding=2 cellspacing=0 align=center:) (:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccc89 align=center:) Xavier (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles (:cellnr bgcolor=#cccc99 align=center:) 004 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Xavier.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Fernando.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Tremblay.4.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Gilles.4.mp3 width=62 height=18:) (:tableend:)
Voice Conversion:
- with Converted source envelope
(:table border=1 cellpadding=2 cellspacing=0 align=center:) (:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccc89 align=center:) Xavier (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles (:cellnr bgcolor=#cccc99 align=center:) 001 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Xavier.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Fernando.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Tremblay.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Gilles.1.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 082 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Fernando.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Tremblay.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Gilles.82.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 088 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Fernando.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Tremblay.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Gilles.88.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 099 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Fernando.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Tremblay.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yhat.Gilles.99.mp3 width=62 height=18:) (:tableend:)
- Overfitting
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM256dem.mp3 width=62 height=18:)(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM256-fulldem.mp3 width=62 height=18:)
- with Aligned target envelope
(:table border=1 cellpadding=2 cellspacing=0 align=center:) (:cellnr bgcolor=#cccc99 align=center:) # (:cell bgcolor=#cccc89 align=center:) Xavier (:cell bgcolor=#cccc89 align=center:) Fernando (:cell bgcolor=#cccc89 align=center:) Tremblay (:cell bgcolor=#cccc89 align=center:) Gilles (:cellnr bgcolor=#cccc99 align=center:) 001 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/Corpus1_Xavier.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.1.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.1.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 082 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.82.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.82.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 088 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.88.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.88.mp3 width=62 height=18:) (:cellnr bgcolor=#cccc99 align=center:) 099 (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER//Corpus1_Xavier.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Fernando.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Tremblay.99.mp3 width=62 height=18:) (:cell align=center:) (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XAVIER/resTransGMM_Yaligned.Gilles.99.mp3 width=62 height=18:) (:tableend:)
- Models were learn on only 200 short phrases.
- Models were learn on only 200 short phrases = 9 to 10mn of speech.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/ircamAlign6b.jpg
Researcher and developper in ANR VIVOS project:\\
Researcher and developer in ANR VIVOS project:\\
Researcher and developer in AngelStudio project:\\
Researcher and developer in FEDER AngelStudio project:\\
Researcher and developer in French ANR Affective Avatars project:\\
Researcher and developer in ANR Affective Avatars project:\\
Researcher and developper in French ANR VIVOS project:\\
Researcher and developper in ANR VIVOS project:\\
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/speaking_styles_samples.mp3 width=200 height=18:)
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/cursus/speaking_styles_samples.mp3 width=200 height=18:)
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/speaking_styles_samples.mp3 width=200 height=18:)
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/speaking_styles_samples.mp3 width=200 height=18:)
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/cursus/speaking_styles_samples.mp3 width=200 height=18:)
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/speaking_styles_samples.mp3 width=200 height=18:)
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/cursus/speaking_styles_samples.mp3 width=200 height=18:)||Original voice||
(:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/cursus/speaking_styles_samples.mp3 width=200 height=18:)
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/cursus/speaking_styles_samples.mp3 width=200 height=18:)||Original voice||
Voice Conversion
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,
P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
LREC'08 Proceedings, Marrakech, Marocco, 2008.
Dynamic Model Selection for spectral Voice Conversion
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
- Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis,
N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet,
ICASSP 2011, Prague, Czech Republic, May 2011, Submitted
- Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features,
N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet,
Speech Prosody 2010 Proceedings, Chicago, USA, 2010.
- A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method,
P. Lanchantin, G. Degottex and X. Rodet,
ICASSP2010 Proceedings, Dallas, USA, 2010.
Dynamic Model Selection for spectral Voice Conversion
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
- Dynamic Model Selection for Spectral Voice Conversion,
P. Lanchantin and X. Rodet,
''Interspeech 2010 Proceedings, Makuhari, Japan, Sept 2010.
Baseline system using STRAIGHT for French speech synthesis
Models were learn on only 200 short phrases.
Baseline system using STRAIGHT for French speech synthesis
- Models were learn on only 200 short phrases.
Baseline system using STRAIGHT for French speech synthesis
Models were learn on only 200 short phrases.
HTS | Reference | |
---|---|---|
Xavier | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XavierHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/XavierRef.mp3 width=200 height=18:) |
Chungsin | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ChungsinHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ChungsinRef.mp3 width=200 height=18:) |
Carmine | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CarmineHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CarmineRef.mp3 width=200 height=18:) |
Cocteau | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CocteauHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CocteauRef.mp3 width=200 height=18:) |
Fernando | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/FernandoHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/FernandoRef.mp3 width=200 height=18:) |
Hugues | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/HuguesHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/HuguesRef.mp3 width=200 height=18:) |
Simon | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/SimonHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/SimonRef.mp3 width=200 height=18:) |
Thomas | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ThomasHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/ThomasRef.mp3 width=200 height=18:) |
Tremblay | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/TremblayHTS.mp3 width=200 height=18:) | (:flash http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/dewplayer.swf?son=http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/TremblayRef.mp3 width=200 height=18:) |
Professional Experience
Professional Experiences
- I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I am working on a voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed G. Degottex
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model proposed by G. Degottex
Jan 2010-Jun2011:\\
Jan 2010 - Jun2011:\\
Jul 2008-Jan 2010:\\
Jul 2008 - Jan 2010:\\
- I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I developed a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
Jan 2007-Jun 2008:\\
Jan 2007 - Jun 2008:\\
- The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS.
Sep 2005-Sep 2006:\\
- The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developed for Unit selection TTS.
Sep 2005 - Sep 2006:\\
Sep 2002-Sep 2005:\\
Sep 2002 - Sep 2005:\\
Sep 2002-Dec 2002:\\
Sep 2002 - Dec 2002:\\
Mar 2002-Sep 2002:\\
Mar 2002 - Sep 2002:\\
Mar 2001-Sep 2001:\\
Mar 2001 - Sep 2001:\\
Jan 2000-Mar 2000:\\
Jan 2000 - Mar 2000:\\
Jan 2010-Jun2011
Jan 2010-Jun2011:\\
Jul 2008-Jan 2010
Jul 2008-Jan 2010:\\
Professional Experience
Jan 2010-Jun2011
Researcher and developer in AngelStudio project:
IRCAM, Paris, France
- I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation.
- The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
Jul 2008-Jan 2010
Researcher and developer in French ANR Affective Avatars project:
IRCAM, Paris, France
- I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed G. Degottex
Jan 2007-Jun 2008:
Researcher and developper in French ANR VIVOS project:
IRCAM, Paris, France
- I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- A confidence measure is computed for manual correction.
- The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS.
Sep 2005-Sep 2006:
Teaching assistant for Master students:
Paris XI University, Orsay, France
- teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
Sep 2002-Sep 2005:
Teaching assistant for Master students:
Institut National des telecommunications, Evry, France
- teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
Sep 2002-Dec 2002:
Invited researcher:
Ocean Systems Laboratory, Heriot-Watt University, Edinburgh
- Study and evaluation of SONAR images segmentation algorithm.
Mar 2002-Sep 2002:
Master Training course in statistical RADAR image segmentation:
TBU Radar Development, THALES Air Defence, Bagneux, France
- Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection.
Mar 2001-Sep 2001:
Master Training course in non-linear Mechanics:
UER de Mécanique, ENSTA Palaiseau, France
- Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
Jan 2000-Mar 2000:
Training course in non-linear Optics:
Photonics and nanostructure laboratory, CNET Bagneux, France
- Simulation of the propagation of a gaussian beam in a non-linear medium (C++)
Employment History
jan2000-mar2000:
Training course in non-linear Optics:
Photonics and nanostructure laboratory, CNET Bagneux, France
- Simulation of the propagation of a gaussian beam in a non-linear medium (C++)
mar2001-sep2001:
Master Training course in non-linear Mechanics:
UER de Mécanique, ENSTA Palaiseau, France
- Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
mar2002-sep2002:
Master Training course in statistical RADAR image segmentation:
TBU Radar Development, THALES Air Defence, Bagneux, France
- Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection.
sep2002-dec2002:
Invited researcher:
Ocean Systems Laboratory, Heriot-Watt University, Edinburgh
- Study and evaluation of SONAR images segmentation algorithm.
sep2002-sep2005:
Teaching assistant for Master students:
Institut National des telecommunications, Evry, France
- teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
sep2005-sep2006:
Teaching assistant for Master students:
Paris XI University, Orsay, France
- teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
Jan2007-Jun2008:
Researcher and developper in French ANR VIVOS project:
Ircam, Paris, France
- I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- A confidence measure is computed for manual correction.
- The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS.
Jul2008-Jan2010
Researcher and developer in French ANR Affective Avatars project:
Ircam, Paris, France
- I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed G. Degottex
Jan 2010-Jun2011
Researcher and developer in AngelStudio project:
Ircam, Paris, France
- I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation.
- The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
Secondary education\\
Secondary education
jan 2000-mar 2000:
Photonics and nanostructure laboratory, CNET Bagneux, France
Training course in non-linear Optics
simulation of the propagation of a gaussian beam in a non-linear medium (C++)
mar 2001-sept 2001:
UER de Mécanique, ENSTA Palaiseau, France
Master Training course in non-linear Mechanics
Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
mar 2002-sep 2002 TBU Radar Development, THALES Air Defence, Bagneux, France Master Training course in statistical RADAR image segmentation Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection. Reason for leaving: none
sep2002-dec2002 Ocean Systems Laboratory, Heriot-Watt University, Edinburgh Invited researcher Study and evaluation of SONAR images segmentation algorithm. Reason for leaving: none
sep2002-sep2005 Institut National des telecommunications, Evry, France Teaching assistant for Master students Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
sep2002-sep2005 Paris XI University, Orsay, France Teaching assistant for Master students C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
Jan2007-Jun2008 Ircam, Paris, France Researcher and developper in French ANR VIVOS project. I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text. A confidence measure is computed for manual correction. The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS.
Jul2008-January 10
Ircam, Paris, France Researcher and developer in French ANR Affective Avatars project. I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed by one of my colleagues. Reason for leaving: end of the contract
Jan 2010-Jun 2011 Ircam, Paris, France Researcher and developer in AngelStudio project. I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation. The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
jan2000-mar2000:
Training course in non-linear Optics:
Photonics and nanostructure laboratory, CNET Bagneux, France
- Simulation of the propagation of a gaussian beam in a non-linear medium (C++)
mar2001-sep2001:
Master Training course in non-linear Mechanics:
UER de Mécanique, ENSTA Palaiseau, France
- Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
mar2002-sep2002:
Master Training course in statistical RADAR image segmentation:
TBU Radar Development, THALES Air Defence, Bagneux, France
- Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection.
sep2002-dec2002:
Invited researcher:
Ocean Systems Laboratory, Heriot-Watt University, Edinburgh
- Study and evaluation of SONAR images segmentation algorithm.
sep2002-sep2005:
Teaching assistant for Master students:
Institut National des telecommunications, Evry, France
- teaching: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
sep2005-sep2006:
Teaching assistant for Master students:
Paris XI University, Orsay, France
- teaching: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
Jan2007-Jun2008:
Researcher and developper in French ANR VIVOS project:
Ircam, Paris, France
- I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text.
- A confidence measure is computed for manual correction.
- The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS.
Jul2008-Jan2010
Researcher and developer in French ANR Affective Avatars project:
Ircam, Paris, France
- I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed G. Degottex
Jan 2010-Jun2011
Researcher and developer in AngelStudio project:
Ircam, Paris, France
- I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features.
- I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation.
- The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
2000-2002:
Ingénieur Telecom INT: Master Degree in Telecommunications (french Grande Ecole)\\
2001-2002:
DEA OSS: Master Degree in System optimization and safety\\
- With honors
- Keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
2001-2002:
DEA OSS: Master Degree in System optimization and safety
Institut National des télécommunications, Evry, France.
2000-2002:
Ingénieur Telecom INT: Master Degree in Telecommunications (french Grande Ecole)
Institut National des télécommunications, Evry, France.
- With honors
- Keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
Brunoy
Employer 1 Name of Employer: Photonics and nanostructure laboratory, CNET Bagneux, France Job title, description of duties and responsabilities: Training course in non-linear Optics: simulation of the propagation of a gaussian beam in a non-linear medium (C++) Reason for leaving: none
Photonics and nanostructure laboratory, CNET Bagneux, France Training course in non-linear Optics simulation of the propagation of a gaussian beam in a non-linear medium (C++)
Employer 2 Name of Employer: UER de Mécanique, ENSTA Palaiseau, France Job Title, description of duties and responsabilities: Master Training course in non-linear Mechanics: Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration. Reason for leaving: none
UER de Mécanique, ENSTA Palaiseau, France Master Training course in non-linear Mechanics Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
Employer 3 Name of Employer: TBU Radar Development, THALES Air Defence, Bagneux, France Job Title, description of duties and responsabilities: Master Training course in statistical RADAR image segmentation: Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection.
TBU Radar Development, THALES Air Defence, Bagneux, France Master Training course in statistical RADAR image segmentation Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection.
Employer 4 Name of Employer: Ocean Systems Laboratory, Heriot-Watt University, Edinburgh Job Title, description of duties and responsabilities: Invited researcher. Study and evaluation of SONAR images segmentation algorithm.
Ocean Systems Laboratory, Heriot-Watt University, Edinburgh Invited researcher Study and evaluation of SONAR images segmentation algorithm.
sep2002-september2005 Employer 5 Name of Employer: Institut National des telecommunications, Evry, France Job Title, description of duties and responsabilities: Teaching assistant for master students in the following courses: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor. Reason for leaving: none
Employer 6 Name of Employer: Paris XI University, Orsay, France Job Title, description of duties and responsabilities: Teaching assistant for Master students in the following courses: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor. Reason for leaving: none
Institut National des telecommunications, Evry, France Teaching assistant for Master students Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor.
sep2002-sep2005 Paris XI University, Orsay, France Teaching assistant for Master students C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor.
Employer 7 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in French ANR VIVOS project.
Ircam, Paris, France Researcher and developper in French ANR VIVOS project.
Reason for leaving: end of the contract
Employer 8 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in French ANR Affective Avatars project. I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I also developped a HMM-based speech synthesis system for French based on HTS including a new excitation model developped by one of my colleagues.
Ircam, Paris, France Researcher and developer in French ANR Affective Avatars project. I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I also developed a HMM-based speech synthesis system for French based on HTS including a new excitation model developed by one of my colleagues.
Employer 9 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in AngelStudio project. I am developping a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation. The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences. Reason for leaving: end of the contract
2002-2006 :
Phd in Statistical Signal Processing - Institut National des Télécommunications
- Title : Unsupervised Signal Segmentation Using Triplet Markov Chains.
2000-2002 :
M.S. in Telecommunications - Institut National des Télécommunications
- Major: Image processing.
2001-2002 :
M.S. in system optimisation and safety (mention Bien) - Institut National des Télécommunications
- Major: Decision in Signal and Image processing.
2000-2001 :
M.S. ATIAM (mention AB) - Paris VI University
- Acoustic, Signal Processing and computer science applied to Music.
1997-2000 :
B.S. in Physics (mention AB) - Evry-val-d'Essonne University
1995-1997 :
Degree in Physics - ''Evry-val-d'Essonne University
''
« Audio Engineer Diploma» - ''School of Audio Engineering, Paris Sound technics.''
Ircam, Paris, France Researcher and developer in AngelStudio project. I am developing a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation. The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences.
Secondary education in Sciences
Secondary education
Brunoy
- General education in Sciences
2002-2006 :
Phd in Statistical Signal Processing - Institut National des Télécommunications
- Title : Unsupervised Signal Segmentation Using Triplet Markov Chains.
2000-2002 :
M.S. in Telecommunications - Institut National des Télécommunications
- Major: Image processing.
2001-2002 :
M.S. in system optimisation and safety (mention Bien) - Institut National des Télécommunications
- Major: Decision in Signal and Image processing.
2000-2001 :
M.S. ATIAM (mention AB) - Paris VI University
- Acoustic, Signal Processing and computer science applied to Music.
1997-2000 :
B.S. in Physics (mention AB) - Evry-val-d'Essonne University
1995-1997 :
Degree in Physics - ''Evry-val-d'Essonne University
''
« Audio Engineer Diploma» - ''School of Audio Engineering, Paris Sound technics.''
1989-1995:
Secondary education in Sciences
1995-1997:
Audio engineer diploma - School of Audio Engineering Institute, SAE Paris, France
- a two years degree in sound audio technics
- Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
1995-2000:
Evry-val-d'Essonne University, Evry, France
Bachelor and first year of Master of Science in Physics (Deug, licence and maitrise de Physique in French)
with honors
keywords: Statistical Physics, Optics, Electromagnetism, Relativity, Quantum Mechanics, Electronics, Numerical Analysis
2000-2001:
DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music
Paris VI University/IRCAM, Paris, France
with honors
keywords: Acoustics(general and musical), Audio-numerical signal processing, Computer Science for Music.
2001-2002:
DEA OSS: Master Degree in System optimization and safety\\
2002-2006:
Phd in Statistical Signal Processing\\
- Major: signal processing and decision theory.
- with honors
- keywords: Logistic, Risk management, System diagnostic, Decision theory in Signal and Image.
- Title "Triplet Markov chains and Unsupervised signal segmentation"
- Director: Wojciech Pieczynski.
- With honors (mention très honorable)
- Keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- with honors
- keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
2002-2006:
Phd in Statistical Signal Processing\\
- With honors
- Keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
2001-2002:
DEA OSS: Master Degree in System optimization and safety\\
- Title "Triplet Markov chains and Unsupervised signal segmentation"
- Director: Wojciech Pieczynski.
- with honors (mention très honorable)
- Keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
- Major: signal processing and decision theory.
- With honors (mention B)
- Keywords: Logistic, Risk management, System diagnostic, Decision theory in Signal and Image.
2000-2001:
DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music
Paris VI University/IRCAM, Paris, France
- With honors (mention AB)
- Keywords: Acoustics(general and musical), Audio-numerical signal processing, Computer Science for Music.
1995-2000:
Deug, licence and maitrise de Physique: Bachelor and first year of Master of Science in Physics
Evry-val-d'Essonne University, Evry, France
- With honors (mention AB)
- Keywords: Statistical Physics, Optics, Electromagnetism, Relativity, Quantum Mechanics, Electronics, Numerical Analysis
1995-1997:
Audio engineer diploma
School of Audio Engineering Institute, SAE Paris, France
- a two years degree in sound audio technics
- Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
1989-1995:
Secondary education in Sciences
DEA ATIAM:
Master Degree in Acoustics, Signal Processing, Computer science applied to Music\\
DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music\\
DEA OSS:
Master Degree in System optimization and safety\\
DEA OSS: Master Degree in System optimization and safety\\
Ingénieur Telecom INT: Master Degree in Telecommunications\\
Ingénieur Telecom INT: Master Degree in Telecommunications (french Grande Ecole)\\
''Paris VI University/IRCAM, Paris, France '''DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music
DEA ATIAM:
Master Degree in Acoustics, Signal Processing, Computer science applied to Music
Paris VI University/IRCAM, Paris, France
DEA OSS:
Master Degree in System optimization and safety\\
DEA OSS: Master Degree in System optimization and safety
Ingénieur Telecom INT: Master Degree in Telecommunications\\
Master Degree (french Grande Ecole engineering degree) in Telecommunications
2002-2006
2002-2006:
Phd in Statistical Signal Processing\\
Phd in Statistical Signal Processing
Institution 2 : Evry-val-d'Essonne University, Evry, France Qualifications gained and subjects studied: Bachelor and first year of Master of Science in Physics (Deug, licence and maitrise de Physique in French)
Evry-val-d'Essonne University, Evry, France Bachelor and first year of Master of Science in Physics (Deug, licence and maitrise de Physique in French)
Institution 3 Paris VI University/IRCAM, Paris, France Qualifications gained and subjects studied: Master Degree (french DEA) in Acoustics, Signal Processing, Computer science applied to Music
''Paris VI University/IRCAM, Paris, France '''DEA ATIAM: Master Degree in Acoustics, Signal Processing, Computer science applied to Music
Institution 4 Institut National des télécommunications, Evry, France. Qualifications gained and subjects studied: Master Degree (french DEA) in System optimization and safety with a specialization in signal processing and decision theory. with honors keywords: Logistic, Risk management, System diagnostic, Decision theory in Signal and Image.
Institut National des télécommunications, Evry, France. DEA OSS: Master Degree in System optimization and safety
- Major: signal processing and decision theory.
- with honors
- keywords: Logistic, Risk management, System diagnostic, Decision theory in Signal and Image.
Institution 5 Institut National des télécommunications, Evry, France. Qualifications gained and subjects studied: Master Degree (french Grande Ecole engineering degree) in Telecommunications with honors keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
Institut National des télécommunications, Evry, France. Master Degree (french Grande Ecole engineering degree) in Telecommunications
- with honors
- keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
Institution 6 Institut National des télécommunications, Evry, France Qualifications gained and subjects studied: Phd in Statistical Signal Processing, Title "Triplet Markov chains and Unsupervised signal segmentation". Director: Wojciech Pieczynski. with honors (mention très honorable) keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
Institut National des télécommunications, Evry, France. Phd in Statistical Signal Processing
- Title "Triplet Markov chains and Unsupervised signal segmentation"
- Director: Wojciech Pieczynski.
- with honors (mention très honorable)
- Keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
School of Audio Engineering Institute, Paris, France - Audio engineering diploma
- two years degree in Sounds audio technics
- Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
Audio engineer diploma - School of Audio Engineering Institute, SAE Paris, France
- a two years degree in sound audio technics
- Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
Secondary education : Qualification gained and subjects studied Scientific education
Secondary education in Sciences
Institution 1 : School of Audio Engineering Institute, Paris, France Qualifications gained and subjects studied: Audio engineering diploma : two years degree in Sounds audio technics Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
School of Audio Engineering Institute, Paris, France - Audio engineering diploma
- two years degree in Sounds audio technics
- Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
1989-1995:
Secondary education :
Qualification gained and subjects studied
Scientific education
1995-1997:
Institution 1 : School of Audio Engineering Institute, Paris, France
Qualifications gained and subjects studied: Audio engineering diploma : two years degree in Sounds audio technics
Keywords: studio and live recording technics, sonorisation, Acoustics, mastering.
1995-2000:
Institution 2 : Evry-val-d'Essonne University, Evry, France
Qualifications gained and subjects studied: Bachelor and first year of Master of Science in Physics (Deug, licence and maitrise de Physique in French)
with honors
keywords: Statistical Physics, Optics, Electromagnetism, Relativity, Quantum Mechanics, Electronics, Numerical Analysis
2000-2001:
Institution 3 Paris VI University/IRCAM, Paris, France
Qualifications gained and subjects studied: Master Degree (french DEA) in Acoustics, Signal Processing, Computer science applied to Music
with honors
keywords: Acoustics(general and musical), Audio-numerical signal processing, Computer Science for Music.
2001-2002:
Institution 4 Institut National des télécommunications, Evry, France.
Qualifications gained and subjects studied: Master Degree (french DEA) in System optimization and safety with a specialization in signal processing and decision theory.
with honors
keywords: Logistic, Risk management, System diagnostic, Decision theory in Signal and Image.
2000-2002:
Institution 5 Institut National des télécommunications, Evry, France.
Qualifications gained and subjects studied: Master Degree (french Grande Ecole engineering degree) in Telecommunications
with honors
keywords: Signal processing, Computer Science, Probability and statistics, graph optimization, numerical analysis, Information theory, numerical communication, optical communications, Network-TCP/IP, Specialization in statistical image processing during the last year.
2002-2006 Institution 6 Institut National des télécommunications, Evry, France Qualifications gained and subjects studied: Phd in Statistical Signal Processing, Title "Triplet Markov chains and Unsupervised signal segmentation". Director: Wojciech Pieczynski. with honors (mention très honorable) keywords: Hidden Markov Models, Pairwise and triplet Markov chains and trees, Bayesian estimation, Expectation-maximisation, non-stationary process segmentation, centered gaussian process with long memory noise, Dempster-shafer theory, SAR image segmentation.
Experience
Employment History
jan 2000-mar 2000:
Employer 1
Name of Employer: Photonics and nanostructure laboratory, CNET Bagneux, France
Job title, description of duties and responsabilities: Training course in non-linear Optics: simulation of the propagation of a gaussian beam in a non-linear medium (C++)
Reason for leaving: none
mar 2001-sept 2001:
Employer 2
Name of Employer: UER de Mécanique, ENSTA Palaiseau, France
Job Title, description of duties and responsabilities: Master Training course in non-linear Mechanics: Analytical and numerical study of the temporal response of a circular plates involving a set of internal resonances in the context of non linear vibration.
Reason for leaving: none
mar 2002-sep 2002 Employer 3 Name of Employer: TBU Radar Development, THALES Air Defence, Bagneux, France Job Title, description of duties and responsabilities: Master Training course in statistical RADAR image segmentation: Study of statistical radar image segmentation algorithm in the application field of Doppler cartography in order to reduce false alarm in RADAR detection. Reason for leaving: none
sep2002-dec2002 Employer 4 Name of Employer: Ocean Systems Laboratory, Heriot-Watt University, Edinburgh Job Title, description of duties and responsabilities: Invited researcher. Study and evaluation of SONAR images segmentation algorithm. Reason for leaving: none
sep2002-september2005 Employer 5 Name of Employer: Institut National des telecommunications, Evry, France Job Title, description of duties and responsabilities: Teaching assistant for master students in the following courses: Introduction to Statistics, algorithmic and C language, statistical methods in Image processing, final projects supervisor. Reason for leaving: none
sep2002-sep2005 Employer 6 Name of Employer: Paris XI University, Orsay, France Job Title, description of duties and responsabilities: Teaching assistant for Master students in the following courses: C language, UNIX, numerical analysis, multimedia (coding), System and Network, final projects supervisor. Reason for leaving: none
Jan2007-Jun2008 Employer 7 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in French ANR VIVOS project. I developped a segmentation system based on HTK and on the french phonetizer LIAPHON to automatically extract the language structure at different level (phone, word, phrase, paragraph) and to align it on speech audio signal. Multiple pronunciation are possible using a constrained phonetic graph build from the text. A confidence measure is computed for manual correction. The aligned linguistic structure was used in the project by ircamCorpusTools, a corpus manager tool similar to festival, which was developped for Unit selection TTS. Reason for leaving: end of the contract
Jul2008-January 10 Employer 8 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in French ANR Affective Avatars project. I developped a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I also developped a HMM-based speech synthesis system for French based on HTS including a new excitation model developped by one of my colleagues. Reason for leaving: end of the contract
Jan 2010-Jun 2011 Employer 9 Name of Employer: Ircam, Paris, France Job Title, description of duties and responsabilities: Researcher and developper in AngelStudio project. I am developping a Voice conversion system based on GMM modeling of the joint law of source and target acoustic features. I am also implementing a one-to-many voice conversion system based on a canonical eigenvoice model estimated by SAT for fast Adaptation. The aim of VC in this project is to converted the voice of a commercial TTS to the voice of the user using few sentences. Reason for leaving: end of the contract
Experience
2002-2006 :
Phd in Statistical Signal Processing - Institut National des Télécommunications
- Title : Unsupervised Signal Segmentation Using Triplet Markov Chains.
2000-2002 :
M.S. in Telecommunications - Institut National des Télécommunications
- Major: Image processing.
2001-2002 :
M.S. in system optimisation and safety (mention Bien) - Institut National des Télécommunications
- Major: Decision in Signal and Image processing.
2000-2001 :
M.S. ATIAM (mention AB) - Paris VI University
- Acoustic, Signal Processing and computer science applied to Music.
1997-2000 :
B.S. in Physics (mention AB) - Evry-val-d'Essonne University
1995-1997 :
Degree in Physics - ''Evry-val-d'Essonne University
''
« Audio Engineer Diploma» - ''School of Audio Engineering, Paris Sound technics.''
Research Interests
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/voder.gif
- Hidden Markov Models (HMM) for Image and Speech processing
- Speech recognition
- HMM Speech synthesis (HTS)
- Voice Conversion
- Signal and Image Segmentation
- Data fusion
Education
2002-2006 :
Phd in Statistical Signal Processing - Institut National des Télécommunications
- Title : Unsupervised Signal Segmentation Using Triplet Markov Chains.
2000-2002 :
M.S. in Telecommunications - Institut National des Télécommunications
- Major: Image processing.
2001-2002 :
M.S. in system optimisation and safety (mention Bien) - Institut National des Télécommunications
- Major: Decision in Signal and Image processing.
2000-2001 :
M.S. ATIAM (mention AB) - Paris VI University
- Acoustic, Signal Processing and computer science applied to Music.
1997-2000 :
B.S. in Physics (mention AB) - Evry-val-d'Essonne University
1995-1997 :
Degree in Physics - ''Evry-val-d'Essonne University
''
« Audio Engineer Diploma» - ''School of Audio Engineering, Paris Sound technics.''
HMM-based Speech Segmentation
HMM-based Speech Segmentation
ircamAlign
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
Dynamic Model Selection for spectral Voice Conversion
Dynamic Model Selection for spectral Voice Conversion
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
HMM-based Speech Segmentation
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
HMM-Based Speech Synthesis
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
Dynamic Model Selection for spectral Voice Conversion
Dynamic Model Selection for spectral Voice Conversion
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
Automatic Phoneme Segmentation with Relaxed Textual Constraints
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/Pierre.jpg
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/CURSUS2010/ircamAlign6b.jpg
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/Pierre.jpg
Presentation ircamAlign: Automatic Phoneme Segmentation with relaxed textual constraints
Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis
This paper presents an approach for modeling speaking style of various discourse genres in speech synthesis. The proposed approach is based on phonological and acoustic average discourse genre - dependent speaking style parametric models. The phonological module models the average abstract prosodic structure of a specific discourse genre. The acoustic module jointly models average speaking style voice and prosodic cues of a given discourse genre. Discourse genre - dependent speaking style models have been estimated for 4 discourses genres and evaluated on a speaking style prosodic identification perceptual experiment. A comparison with speaking style identification on real speech is discussed and reveals consistent performance of the proposed approach.
Dynamic Model Selection for spectral Voice Conversion
Statistical methods for voice conversion are usually based on a signle model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features
A major drawback of current Hidden Markov Model-based speech synthesis is the monotony of the generated speech which is closely related to the monotony of the generated prosody. Complementary to model-oriented approaches that aim to increase the prosodic variability by reducing the "oversmoothing" effect, this paper presents a linguistic-oriented approaches in which high level linguistic features are extracted from text in order to improve prosody modeling. A linguistic processing chain based on linguistic preprocessing, morpho-syntactical labeling, and syntactical parsing is used to extract high-level syntactical features from an input text. Sych linguistic features are then introduces into a HMM-based speech synthesis system to model prosodic variations (f0, duration, and spectral variations). Subjective evaluation reveals that the proposed approach significantly improve speech synthesis compared to a baseline model, even if such improvement depends of the observed llinguisitc phenomenon.
A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods
This paper introduces a HMM-based speech synthesis system which uses a new method for the separation of vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. the synthesis results are finally discussed and subjectively evaluated.
Automatic Phoneme Segmentation with Relaxed Textual Constraints
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques based on Hidden Markov Models can be optimized for maximum segmentation accuracy. This paper presents the results of tuning suc a phome segmentation system. Firstly using no text transcription, the design of an HMM phoneme recognizer is optimized subject to a phonem bigram language model. Optimal perforamnce is obtained with triphone models, 7 states per phoneme and 5 Gaussians per stat, reaching 94,4% phoneme recognition accuracy with 95.2% of phoneme boundaries within 70ms of hand labelled boundaries. Secondly, using the textual information modeled by a multi pronunciation phonetic graph built according to errors found in the first step, the reported phoneme recognition accuracy increases to 96,8% with 96,1% of phonem boundaries within 70ms of hand labelled boundaries. Finally, the results from these two segmentation methods based on different phonetic graphs, the evaluation set, the hand labelling and the test procedures are discussed and possible improvments are proposed.
Presentation ircamAlign: Automatic Phoneme Segmentation with relaxed textual constraints