Main.Research History

Hide minor edits - Show changes to markup

January 11, 2012, at 08:28 PM by 129.169.82.208 -
Changed lines 49-50 from:

HMM-based Speech Segmentation into phones

to:

HMM-based Speech Segmentation

July 21, 2011, at 03:02 PM by 85.171.159.4 -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied during my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 28, 2011, at 01:59 AM by 85.171.159.4 -
Changed lines 35-36 from:

We have also studied with W. Pieczynski, as part of the study of triplet Markov chains, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

to:

We have also studied with W. Pieczynski, as part of the study of triplet Markov chains, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2]. We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

April 28, 2011, at 01:59 AM by 85.171.159.4 -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 28, 2011, at 01:58 AM by 85.171.159.4 -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 28, 2011, at 01:58 AM by 85.171.159.4 -
Changed lines 35-36 from:

We have also studied with W. Pieczynski, as part of the study of triplet Markov chains, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2 ] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

to:

We have also studied with W. Pieczynski, as part of the study of triplet Markov chains, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

April 28, 2011, at 12:06 AM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 11-12 from:

Pairwise Markov Chains and Pairwise Partially Markov Chains

to:

Pairwise Partially Markov Chains

April 27, 2011, at 10:14 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 55-56 from:

A set of multispeaker French modelshave been learned from the corpus BREF80. A confidence index based on posterior probabilities is calculated for each phone to facilitate a possible manual correction of segmentation results.

to:

A set of multispeaker French models have been learned from the corpus BREF80. A confidence index based on posterior probabilities is calculated for each phone to facilitate a possible manual correction of segmentation results.

April 27, 2011, at 10:14 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 51-58 from:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on a particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

If the textual transcription is available, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

A set of multispeaker French modelshave been learned from the corpus BREF80. To reduce the computing time needed for learning, we have taken advantage of the fact that the calculations needed to estimate parameters can be decomposed and can be performed in parallel on the 48 computers core of the team. On the other hand, a confidence index based on posterior probabilities is calculated for each phone to facilitate a possible manual correction of segmentation results.

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) [ C7 ] by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [ AF2 ] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

to:

During the ANR (French National Research Agency) project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on a particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

If the textual transcription is available, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or paraverbal sounds like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

A set of multispeaker French modelshave been learned from the corpus BREF80. A confidence index based on posterior probabilities is calculated for each phone to facilitate a possible manual correction of segmentation results.

During the segmentation, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to-Speech (TTS) [ C7 ] by units selection. ircamAlign is used by ircamTTS and ircamCorpusTools [ AF2 ] which is a speech units database management system. On the other hand, ''ircamAlign' is used in the ANR project Rhapsody for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

April 27, 2011, at 10:10 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 53-54 from:

Based on this observation, if the textual transcription exists, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

to:

If the textual transcription is available, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

April 27, 2011, at 10:10 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 51-54 from:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on a particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

Based on this observation, if the textual transcription exists, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

to:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on a particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

Based on this observation, if the textual transcription exists, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

April 27, 2011, at 10:09 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 51-52 from:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

to:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on a particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

April 27, 2011, at 10:08 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 51-52 from:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

to:

During the ANR project Vivos, I proposed and developed, in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. It is based on particular modeling of speech signals by hidden Markov chains used in speech recognition. This modeling can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

April 27, 2011, at 10:07 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 49-50 from:

Automatic Speech Segmentation into phones

to:

HMM-based Speech Segmentation into phones

Changed lines 69-70 from:

HMM based speech synthesis

to:

HMM-based Speech Synthesis

April 27, 2011, at 10:06 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 51-52 from:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

to:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This system of segmentation of speech signals into phones is based on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

April 27, 2011, at 10:05 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 71-72 from:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [ C5 ]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [ C6 ] and the possibilities for the synthesis of speaking style for different types of discourse genres in [ C11 ].

to:

The principle of HMM-based speech synthesis proposed initially by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [ C5 ]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [ C6 ] and the possibilities for the synthesis of speaking style for different types of discourse genres in [ C11 ].

April 27, 2011, at 10:03 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 49-50 from:

Automatic Speech segmentation into phones

to:

Automatic Speech Segmentation into phones

April 27, 2011, at 10:02 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 37-38 from:

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [ CF3 ].

to:

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous components, the components being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous components corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [ CF3 ].

April 27, 2011, at 10:01 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 35-36 from:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2 ] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

to:

We have also studied with W. Pieczynski, as part of the study of triplet Markov chains, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2 ] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

April 27, 2011, at 10:00 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 23-24 from:

Pairwise Markov chains can be extended to triplet Markov chains [ C1 ]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [ C2 ] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

to:

Pairwise Markov chains can be extended to triplet Markov chains [ C1 ]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [ C2, A3 ] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

Added lines 29-30:

[ A3 ] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of randomly switching data hidden with non-Gaussian correlated noise, Signal Processing, Vol. 91, No. 2, pp. 163-175, February 2011.

April 27, 2011, at 09:57 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 23-24 from:

Pairwise Markov models can be extended to triplet Markov models [ C1 ]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [ C2 ] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

to:

Pairwise Markov chains can be extended to triplet Markov chains [ C1 ]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [ C2 ] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

April 27, 2011, at 09:57 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 11-12 from:

Pairwise Markov Models and pairwise partially Markov models

to:

Pairwise Markov Chains and Pairwise Partially Markov Chains

Changed lines 21-22 from:

Triplet Markov models

to:

Triplet Markov Chains

April 27, 2011, at 09:55 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 15-16 from:

[ C3 ] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

to:

[ C3 ] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

April 27, 2011, at 09:54 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 13-14 from:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the distribution of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [ C3 ]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [ A2 ].

to:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the distribution of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the Expectation-Maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [ C3 ]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [ A2 ].

April 27, 2011, at 09:53 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 27, 2011, at 09:52 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 27, 2011, at 09:52 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 3-4 from:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [ T ] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially Hidden Markov Models (HMM). I studied for my PhD [ T ] models called Triplet Markov models that generalize the classical HMM (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 27, 2011, at 09:50 PM by 2001:660:3004:64:66b9:e8ff:feb8:c51e -
Changed lines 3-4 from:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [ T ] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is Statistical Signal Processing. My main topics of research include statistical modeling of signals, speech processing and their applications to Music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [ T ] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

April 27, 2011, at 09:46 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 13-14 from:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [ C3 ]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [ A2 ].

to:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the distribution of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [ C3 ]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [ A2 ].

April 27, 2011, at 09:41 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 73-79 from:

demo 2: demo: A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods (with G. Degottex)

demo 3: demo: Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features (with N. Obin)

demo 4:demo: Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis (with N. Obin)

to:

demo 2: A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods (with G. Degottex)

demo 3: Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features (with N. Obin)

demo 4:Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis (with N. Obin)

Added lines 92-94:

demo: Dynamic Model Selection for spectral Voice Conversion

April 27, 2011, at 09:37 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 71-79:

demo 1: Baseline system using STRAIGHT for French speech synthesis

demo 2: demo: A HMM-Based Speech Synthesis System using a New Glottal Source and Vocal-Tract Separation Methods (with G. Degottex)

demo 3: demo: Toward Improved HMM-based Speech Synthesis using High-Level Syntactical Features (with N. Obin)

demo 4:demo: Speaking Style Modeling of Various Discourse Genres in HMM-Based Speech Synthesis (with N. Obin)

April 27, 2011, at 09:35 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 57-58:

demo: ircamAlign

April 27, 2011, at 09:28 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 67-74 from:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [C5]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [C6] and the possibilities for the synthesis of speaking style for different types of discourse genres in [C11].

[C5] P. Lanchantin, G. Degottex and X. Rodet, A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method, ICASSP'10, Dallas, USA 2010,

[C6] N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet, Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features, Speech Prosody, Chicago, USA, 2010

[C11] N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet, Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation, soumis à Interspeech 2011, Florence, Italy, 2011

to:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [ C5 ]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [ C6 ] and the possibilities for the synthesis of speaking style for different types of discourse genres in [ C11 ].

[ C5 ] P. Lanchantin, G. Degottex and X. Rodet, A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method, ICASSP10, Dallas, USA 2010,

[ C6 ] N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet, Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features, Speech Prosody, Chicago, USA, 2010

[ C11 ] N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet, Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation, submitted to Interspeech 2011, Florence, Italy, 2011

Changed lines 79-86 from:

The principle of Voice conversion is to transform the signal from the voice of a source speaker, so it seems to have been issued by a target speaker. Conversion techniques studied at IRCAM by F. Villavicencio and then by myself under the ANR Affective Avatars are based on Gaussian Mixture Models (GMM). Typically, the joint distribution of acoustic source and target characteristics, modeled by a GMM, is estimated from a parallel corpus consisting of synchronous recordings of source and target speakers. The conversion function is then given by the conditional expectation to the acoustic characteristics of the source. My studies have focused both on the definition of the transformation function on its application to improve the quality of converted speech. Thus, all-pole modeling of the spectral envelope has been improved by the True-Envelope technics that enhances the quality of the synthesis and the characterization of the residual from the speaker. On the other hand, the use of the covariance matrix of the conditional distribution to the acoustic characteristics of the source allows a renormalization of the transformed characteristics in order to improve the quality of the converted signal. Finally, during the AngelStudio project, I proposed a method for Dynamic Model Selection (DMS [C8, C10]) which consits in using several models of different complexity and to select the most appropriate model for each frame of analysis during the conversion. The results of voice conversion obtained are very encouraging. Thus, it appears that the "personality" of the target speaker is well reproduced after processing and that the source speaker has largely disappeared. The main difficulty that remains is some degradation of sound quality of voice. However, other ways of improvements we are currently investigating [C13] are expected to arrive at a usable quality, real-time, even for very demanding applications, such as artistic applications.

[C8] P. Lanchantin and X. Rodet, Dynamic Model Selection for Spectral Voice Conversion, Interspeech'10, Makuhari, Japan, 2010

[C10] P. Lanchantin and X. Rodet, Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion, ICASSP2011, accepté, Prague, Czech Republic, 2011

[C13] P. Lanchantin, N. Obin and X. Rodet, Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion, soumis à Interspeech 2011, Florence, Italy, 2011

to:

The principle of Voice conversion is to transform the signal from the voice of a source speaker, so it seems to have been issued by a target speaker. Conversion techniques studied at IRCAM by F. Villavicencio and then by myself under the ANR Affective Avatars are based on Gaussian Mixture Models (GMM). Typically, the joint distribution of acoustic source and target characteristics, modeled by a GMM, is estimated from a parallel corpus consisting of synchronous recordings of source and target speakers. The conversion function is then given by the conditional expectation to the acoustic characteristics of the source. My studies have focused both on the definition of the transformation function on its application to improve the quality of converted speech. Thus, all-pole modeling of the spectral envelope has been improved by the True-Envelope technics that enhances the quality of the synthesis and the characterization of the residual from the speaker. On the other hand, the use of the covariance matrix of the conditional distribution to the acoustic characteristics of the source allows a renormalization of the transformed characteristics in order to improve the quality of the converted signal. Finally, during the AngelStudio project, I proposed a method for Dynamic Model Selection (DMS [ C8, C10 ) which consits in using several models of different complexity and to select the most appropriate model for each frame of analysis during the conversion. The results of voice conversion obtained are very encouraging. Thus, it appears that the "personality" of the target speaker is well reproduced after processing and that the source speaker has largely disappeared. The main difficulty that remains is some degradation of sound quality of voice. However, other ways of improvements we are currently investigating [ C13 ] are expected to arrive at a usable quality, real-time, even for very demanding applications, such as artistic applications.

[ C8 ] P. Lanchantin and X. Rodet, Dynamic Model Selection for Spectral Voice Conversion, Interspeech'10, Makuhari, Japan, 2010

[ C10 ] P. Lanchantin and X. Rodet, Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion, ICASSP2011, accepté, Prague, Czech Republic, 2011

[ C13 ] P. Lanchantin, N. Obin and X. Rodet, Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion, submitted to Interspeech 2011, Florence, Italy, 2011

April 27, 2011, at 09:19 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Deleted lines 46-49:

Speech processing


Changed lines 49-50 from:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [C4]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

to:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [ C4 ]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

Changed lines 55-62 from:

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) [C7] by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [AF2] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

[C4] P. Lanchantin, A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.

[C7] C. Veaux, P. Lanchantin and X. Rodet, Joint Prosodic and Segmental Unit Selection for Expressive Speech Synthesis, 7th Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010

[AF2] G. Beller, C. Veaux, G. Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, Vol. 49, No. 3, 2008

to:

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) [ C7 ] by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [ AF2 ] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

[ C4 ] P. Lanchantin, A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC08), Marrakech, Morocco, 2008.

[ C7 ] C. Veaux, P. Lanchantin and X. Rodet, Joint Prosodic and Segmental Unit Selection for Expressive Speech Synthesis, 7th Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010

[ AF2 ] G. Beller, C. Veaux, G. Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, Vol. 49, No. 3, 2008

April 27, 2011, at 09:15 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 33-44 from:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [CF3].

[A1] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chains using evidential priors, IEEE Transactions on Signal Processing, Vol. 53, No. 8, pp 3091-3098, 2005.

[AF1] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 2, 2005.

[CF2] P. Lanchantin et W. Pieczynski, Arbres de Markov Triplet et théorie de l'évidence, Actes du Colloque GRETSI'03, 8-11 septembre, Paris, France, 2003.

[CF3] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.

to:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [ A1, AF1, CF2 ] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [ CF3 ].

[ A1 ] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chains using evidential priors, IEEE Transactions on Signal Processing, Vol. 53, No. 8, pp 3091-3098, 2005.

[ AF1 ] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 2, 2005.

[ CF2 ] P. Lanchantin et W. Pieczynski, Arbres de Markov Triplet et théorie de l'évidence, Actes du Colloque GRETSI 03, 8-11 septembre, Paris, France, 2003.

[ CF3 ] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI 05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.

April 27, 2011, at 09:00 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 23-28 from:

Pairwise Markov models can be extended to triplet Markov models [C1]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [C2] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

[C1] W. Pieczynski, D. Benboudjema and P. Lanchantin, Statistical image segmentation using Triplet Markov Fields, SPIE's International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002.

[C2] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.

to:

Pairwise Markov models can be extended to triplet Markov models [ C1 ]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [ C2 ] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

[ C1 ] W. Pieczynski, D. Benboudjema and P. Lanchantin, Statistical image segmentation using Triplet Markov Fields, SPIEs International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002.

[ C2 ] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.

April 27, 2011, at 08:57 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 13-18 from:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [C3]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [A2].

[C3] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

[A2] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of Triplet Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.

to:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [ C3 ]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [ A2 ].

[ C3 ] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

[ A2 ] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of Triplet Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.

April 27, 2011, at 08:54 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 3-4 from:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [T] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [ T ] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

Changed lines 7-8 from:

[T] P. Lanchantin, Chaînes de Markov triplets et segmentation non supervisée des signaux/Unsupervised Signal Segmentation using Triplet Markov chains, PhD thesis from Institut National des Télécommunications, december 2006

to:

[ T ] P. Lanchantin, Chaînes de Markov triplets et segmentation non supervisée des signaux/Unsupervised Signal Segmentation using Triplet Markov chains, PhD thesis from Institut National des Télécommunications, december 2006

April 27, 2011, at 08:53 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 7-8 from:

[T] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non Supervisée de Signaux, thèse de doctorat de l'Institut National des Télécommunications, décembre 2006

to:

[T] P. Lanchantin, Chaînes de Markov triplets et segmentation non supervisée des signaux/Unsupervised Signal Segmentation using Triplet Markov chains, PhD thesis from Institut National des Télécommunications, december 2006

Deleted lines 10-13:

Triplet Markov Chains and unsupervised signal segmentation


April 27, 2011, at 08:50 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 13-14:

Added lines 23-24:

Added lines 33-34:

Added lines 49-50:

Added lines 53-54:

Added lines 71-72:

Changed lines 83-84 from:
to:

April 27, 2011, at 08:48 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 9-10 from:

---

to:

April 27, 2011, at 08:48 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 9-10:

---

April 27, 2011, at 08:47 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 29-40 from:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [CF3].

[A1] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chains using evidential priors, IEEE Transactions on Signal Processing, Vol. 53, No. 8, pp 3091-3098, 2005.

[AF1] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 2, 2005.

[CF2] P. Lanchantin et W. Pieczynski, Arbres de Markov Triplet et théorie de l'évidence, Actes du Colloque GRETSI'03, 8-11 septembre, Paris, France, 2003.

[CF3] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.

to:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [CF3].

[A1] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chains using evidential priors, IEEE Transactions on Signal Processing, Vol. 53, No. 8, pp 3091-3098, 2005.

[AF1] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 2, 2005.

[CF2] P. Lanchantin et W. Pieczynski, Arbres de Markov Triplet et théorie de l'évidence, Actes du Colloque GRETSI'03, 8-11 septembre, Paris, France, 2003.

[CF3] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.

Changed lines 45-46 from:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [C4]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

to:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [C4]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

Changed lines 51-58 from:

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) (C7) by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [AF2] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

[C4] P. Lanchantin, A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.

[C7] C. Veaux, P. Lanchantin and X. Rodet, Joint Prosodic and Segmental Unit Selection for Expressive Speech Synthesis, 7th Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010

[AF2] G. Beller, C. Veaux, G. Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, Vol. 49, No. 3, 2008

to:

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) [C7] by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [AF2] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

[C4] P. Lanchantin, A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.

[C7] C. Veaux, P. Lanchantin and X. Rodet, Joint Prosodic and Segmental Unit Selection for Expressive Speech Synthesis, 7th Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010

[AF2] G. Beller, C. Veaux, G. Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, Vol. 49, No. 3, 2008

Changed lines 61-69 from:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [C5]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [C6] and the possibilities for the synthesis of speaking style for different types of discourse genres in [C11].

[C5] P. Lanchantin, G. Degottex and X. Rodet, A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method, ICASSP'10, Dallas, USA 2010,

[C6] N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet, Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features, Speech Prosody, Chicago, USA, 2010

[C11] N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet, Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation, soumis à Interspeech 2011, Florence, Italy, 2011

to:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [C5]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [C6] and the possibilities for the synthesis of speaking style for different types of discourse genres in [C11].

[C5] P. Lanchantin, G. Degottex and X. Rodet, A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method, ICASSP'10, Dallas, USA 2010,

[C6] N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet, Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features, Speech Prosody, Chicago, USA, 2010

[C11] N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet, Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation, soumis à Interspeech 2011, Florence, Italy, 2011

Changed lines 72-79 from:

The principle of Voice conversion is to transform the signal from the voice of a source speaker, so it seems to have been issued by a target speaker. Conversion techniques studied at IRCAM by F. Villavicencio and then by myself under the ANR Affective Avatars are based on Gaussian Mixture Models (GMM). Typically, the joint distribution of acoustic source and target characteristics, modeled by a GMM, is estimated from a parallel corpus consisting of synchronous recordings of source and target speakers. The conversion function is then given by the conditional expectation to the acoustic characteristics of the source. My studies have focused both on the definition of the transformation function on its application to improve the quality of converted speech. Thus, all-pole modeling of the spectral envelope has been improved by the True-Envelope technics that enhances the quality of the synthesis and the characterization of the residual from the speaker. On the other hand, the use of the covariance matrix of the conditional distribution to the acoustic characteristics of the source allows a renormalization of the transformed characteristics in order to improve the quality of the converted signal. Finally, during the AngelStudio project, I proposed a method for Dynamic Model Selection (DMS [C8, C10]) which consits in using several models of different complexity and to select the most appropriate model for each frame of analysis during the conversion. The results of voice conversion obtained are very encouraging. Thus, it appears that the "personality" of the target speaker is well reproduced after processing and that the source speaker has largely disappeared. The main difficulty that remains is some degradation of sound quality of voice. However, other ways of improvements we are currently investigating [C13] are expected to arrive at a usable quality, real-time, even for very demanding applications, such as artistic applications.

[C8] P. Lanchantin and X. Rodet, Dynamic Model Selection for Spectral Voice Conversion, Interspeech'10, Makuhari, Japan, 2010

[C10] P. Lanchantin and X. Rodet, Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion, ICASSP2011, accepté, Prague, Czech Republic, 2011

[C13] P. Lanchantin, N. Obin and X. Rodet, Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion, soumis à Interspeech 2011, Florence, Italy, 2011

to:

The principle of Voice conversion is to transform the signal from the voice of a source speaker, so it seems to have been issued by a target speaker. Conversion techniques studied at IRCAM by F. Villavicencio and then by myself under the ANR Affective Avatars are based on Gaussian Mixture Models (GMM). Typically, the joint distribution of acoustic source and target characteristics, modeled by a GMM, is estimated from a parallel corpus consisting of synchronous recordings of source and target speakers. The conversion function is then given by the conditional expectation to the acoustic characteristics of the source. My studies have focused both on the definition of the transformation function on its application to improve the quality of converted speech. Thus, all-pole modeling of the spectral envelope has been improved by the True-Envelope technics that enhances the quality of the synthesis and the characterization of the residual from the speaker. On the other hand, the use of the covariance matrix of the conditional distribution to the acoustic characteristics of the source allows a renormalization of the transformed characteristics in order to improve the quality of the converted signal. Finally, during the AngelStudio project, I proposed a method for Dynamic Model Selection (DMS [C8, C10]) which consits in using several models of different complexity and to select the most appropriate model for each frame of analysis during the conversion. The results of voice conversion obtained are very encouraging. Thus, it appears that the "personality" of the target speaker is well reproduced after processing and that the source speaker has largely disappeared. The main difficulty that remains is some degradation of sound quality of voice. However, other ways of improvements we are currently investigating [C13] are expected to arrive at a usable quality, real-time, even for very demanding applications, such as artistic applications.

[C8] P. Lanchantin and X. Rodet, Dynamic Model Selection for Spectral Voice Conversion, Interspeech'10, Makuhari, Japan, 2010

[C10] P. Lanchantin and X. Rodet, Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion, ICASSP2011, accepté, Prague, Czech Republic, 2011

[C13] P. Lanchantin, N. Obin and X. Rodet, Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion, soumis à Interspeech 2011, Florence, Italy, 2011

April 27, 2011, at 08:44 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 3-4 from:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [T] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [T] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

Changed lines 7-8 from:

[T] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non Supervisée de Signaux, thèse de doctorat de l'Institut National des Télécommunications, décembre 2006

to:

[T] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non Supervisée de Signaux, thèse de doctorat de l'Institut National des Télécommunications, décembre 2006

Changed lines 13-18 from:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [C3]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [A2].

[C3] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

[A2] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of Triplet Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.

to:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [C3]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [A2].

[C3] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

[A2] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of Triplet Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.

Changed lines 21-26 from:

Pairwise Markov models can be extended to triplet Markov models [C1]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [C2] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

[C1] W. Pieczynski, D. Benboudjema and P. Lanchantin, Statistical image segmentation using Triplet Markov Fields, SPIE's International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002.

[C2] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.

to:

Pairwise Markov models can be extended to triplet Markov models [C1]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [C2] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

[C1] W. Pieczynski, D. Benboudjema and P. Lanchantin, Statistical image segmentation using Triplet Markov Fields, SPIE's International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002.

[C2] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.

April 27, 2011, at 08:41 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 7-8:

[T] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non Supervisée de Signaux, thèse de doctorat de l'Institut National des Télécommunications, décembre 2006

Added lines 15-18:

[C3] W. Pieczynski, P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, IEEE Workshop on Statistical Signal Processing (SSP 05), July 17-20, Bordeaux, France, 2005.

[A2] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of Triplet Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.

Added lines 23-26:

[C1] W. Pieczynski, D. Benboudjema and P. Lanchantin, Statistical image segmentation using Triplet Markov Fields, SPIE's International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002.

[C2] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.

Added lines 33-40:

[A1] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chains using evidential priors, IEEE Transactions on Signal Processing, Vol. 53, No. 8, pp 3091-3098, 2005.

[AF1] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 2, 2005.

[CF2] P. Lanchantin et W. Pieczynski, Arbres de Markov Triplet et théorie de l'évidence, Actes du Colloque GRETSI'03, 8-11 septembre, Paris, France, 2003.

[CF3] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.

Added lines 53-58:

[C4] P. Lanchantin, A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.

[C7] C. Veaux, P. Lanchantin and X. Rodet, Joint Prosodic and Segmental Unit Selection for Expressive Speech Synthesis, 7th Speech Synthesis Workshop (SSW7), Kyoto, Japan, 2010

[AF2] G. Beller, C. Veaux, G. Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, Vol. 49, No. 3, 2008

Added lines 61-69:

The principle of HMM-based speech synthesis developed by the Nagoya Institute of Technology (Nitech) is the joint modeling of the spectrum (vocal tract), the fundamental frequency (source) and durations for each phoneme in context by a hidden Markov chain. During the synthesis, a macro-model is built from the concatenation of the HMMs corresponding to the phones in the context of the phonetic sequence to synthesize. The durations of the states are initially generated and then the trajectory of spectral parameters is estimated from a specific algorithm for spectral parameters generation taking into account the dependency between static and dynamic parameters. One advantage of this method compared to the synthesis of speech by units selection is that it only requires the storage of model parameters. It also allows precise control of the characteristics of the synthesis. The disadvantages of this type of synthesis are artifacts in the synthesized voice due to the glottal source modeling and the lack of natural due to the low variability of the prosody. To overcome these shortcomings we used the separation of vocal tract and glottal source separation method proposed by G. Degottex in [C5]. On the other hand, we have shown with N. Obin the improvement made by using high-level syntactical features [C6] and the possibilities for the synthesis of speaking style for different types of discourse genres in [C11].

[C5] P. Lanchantin, G. Degottex and X. Rodet, A HMM-Based Synthesis System Using a New Glottal Source and Vocal-Tract Separation Method, ICASSP'10, Dallas, USA 2010,

[C6] N. Obin, P. Lanchantin, M. Avanzi, A. Lacheret-Dujour and X. Rodet, Toward Improved HMM-Based Speech Synthesis Using High-Level Syntactical Features, Speech Prosody, Chicago, USA, 2010

[C11] N. Obin, P. Lanchantin, A. Lacheret-Dujour and X. Rodet, Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation, soumis à Interspeech 2011, Florence, Italy, 2011

Changed lines 72-80 from:

References

Refereed Journal Publications

in french

Conference Proceedings

in french

to:

The principle of Voice conversion is to transform the signal from the voice of a source speaker, so it seems to have been issued by a target speaker. Conversion techniques studied at IRCAM by F. Villavicencio and then by myself under the ANR Affective Avatars are based on Gaussian Mixture Models (GMM). Typically, the joint distribution of acoustic source and target characteristics, modeled by a GMM, is estimated from a parallel corpus consisting of synchronous recordings of source and target speakers. The conversion function is then given by the conditional expectation to the acoustic characteristics of the source. My studies have focused both on the definition of the transformation function on its application to improve the quality of converted speech. Thus, all-pole modeling of the spectral envelope has been improved by the True-Envelope technics that enhances the quality of the synthesis and the characterization of the residual from the speaker. On the other hand, the use of the covariance matrix of the conditional distribution to the acoustic characteristics of the source allows a renormalization of the transformed characteristics in order to improve the quality of the converted signal. Finally, during the AngelStudio project, I proposed a method for Dynamic Model Selection (DMS [C8, C10]) which consits in using several models of different complexity and to select the most appropriate model for each frame of analysis during the conversion. The results of voice conversion obtained are very encouraging. Thus, it appears that the "personality" of the target speaker is well reproduced after processing and that the source speaker has largely disappeared. The main difficulty that remains is some degradation of sound quality of voice. However, other ways of improvements we are currently investigating [C13] are expected to arrive at a usable quality, real-time, even for very demanding applications, such as artistic applications.

[C8] P. Lanchantin and X. Rodet, Dynamic Model Selection for Spectral Voice Conversion, Interspeech'10, Makuhari, Japan, 2010

[C10] P. Lanchantin and X. Rodet, Objective Evaluation of the Dynamic Model Selection for Spectral Voice Conversion, ICASSP2011, accepté, Prague, Czech Republic, 2011

[C13] P. Lanchantin, N. Obin and X. Rodet, Extended Conditional GMM and Covariance Matrix Correction for Real-Time Spectral Voice Conversion, soumis à Interspeech 2011, Florence, Italy, 2011

April 27, 2011, at 08:31 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Added lines 15-16:

Pairwise Markov models can be extended to triplet Markov models [C1]. The principle is to add one, or even several, auxiliary process(es) as the joint distribution of the triplet "hidden process, auxiliary processes, observed process" is that of a Markov chain. These very general models allow to palliate another limitation of conventional models which is to assume that the joint distribution is stationary. Indeed, by introducing an auxiliary process controlling changes in transition matrices of the process, we have shown the effectiveness of such a model in situations where the joint distribution of the hidden process and observation is nonstationary [C2] and we proposed algorithms for estimating parameters of a the considered Markov chain. This model was applied to the segmentation of synthetic and real images. A first observation is that this model does allow the consideration of different regimes, resulting in improved quality of segmentation in the case of images with both extensive homogeneous areas and areas with fine details. A second observation is that it is also possible to obtain a realization of the auxiliary process by the MPM estimator. This type of representation can be very useful, especially in segmentation of textures that can be precisely modeled by auxiliary processes.

Added lines 19-22:

We have also studied with W. Pieczynski, as part of the study of triplet Markov models, the possibilities for extending the classical probabilistic model to an "evidential" model, with the posterior probability of the hidden process given by the Dempster-Shafer fusion [A1, AF1, CF2] . We then applied this evidential model to the segmentation of nonstationary processes. The main interest of our approach was to show that although the Dempster-Shafer fusion destroys Markovianity in the context of the hidden evidential chain, Bayesian segmentation is still possible via the triplet Markov chain approach.

A last of my contributions during my PhD thesis was to extend the fuzzy Markov chains previously studied by F. Salzenstein, to fuzzy Markov trees. The fuzzy segmentation was initially proposed to take account of imprecision on a site belonging to a thematic area. Thus, in a fuzzy signal cohabit homogeneous areas ("hard" clusters ) with fuzzy areas representing intermediate sites which may belong several hard clusters. The originality of these models is characterized by the fact that their distribution has both a discrete and a continuous component, the component being formed by discrete Dirac masses representing the weight assigned to each cluster lasts and the continuous component corresponding to the fuzzy classes (Lebesgue measure). We have proposed a multisensor fuzzy hidden Markov tree that we applied to the segmentation of astronomical images [CF3].

Added lines 27-34:

Under the ANR Vivos, I proposed and developed in collaboration with A. C. Morris and X. Rodet, the software ircamAlign [C4]. This is a system of segmentation of speech signals into phones based largely on the HTK library. The system is based on modeling by hidden Markov chains used in particular in speech recognition. This modeling, specific to speech processing, can be viewed as a special case of Markov chain triplet T = (U, X, Y) where U is the language model, X is the process of evolution of spectral features in time (sub-states of the HMM of each phoneme) and Y is the process of the observations (cepstral coefficients).

Based on this observation, if the textual transcription exists, the distribution of the process U can be defined as that of a Markov chain whose topology is a graph constructed from the phonetic text giving the different pronunciations and possible connections. Many options are available for creating this graph. It is thus possible to allow the omission or repetition of words, the insertion of short pauses or sound paraverbal like breathing or lip noises for which specific models have been learned. When the text is not available, such as in the case of a spontaneous speech signal, the distribution of U is defined as being that of a bigram or tri-gram learned on a selected French text set.

A set of multispeaker French modelshave been learned from the corpus BREF80. To reduce the computing time needed for learning, we have taken advantage of the fact that the calculations needed to estimate parameters can be decomposed and can be performed in parallel on the 48 computers core of the team. On the other hand, a confidence index based on posterior probabilities is calculated for each phone to facilitate a possible manual correction of segmentation results.

From this segmentation into phones, the structure of speech (syllables, words, breath groups) is extracted from transcription and aligned to the speech signal in order to build databases of units to the development of a synthetic text-to -Speech (TTS) (C7) by concatenation. ircamAlign is used by ircamTTS and ircamCorpusTools [AF2] which is a management system database of speech units. On the other hand, ircamAlign is used in the ANR Rhapsody project for developing reference corpus of spontaneous speech in French. Finally, ircamAlign has been used by composers at IRCAM. Note that a real-time version has subsequently been developed by J. Bloit and implemented in MaxMSP.

April 27, 2011, at 08:24 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 3-4 from:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD (T) models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

to:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD [T] models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

Added lines 11-12:

The principle of a pairwise Markov chains is to assume that the joint distribution of observed and hidden processes is that of a Markov chain. In the case of a partially pairwise Markov chain, one assumes directly the Markovianity of the law of the hidden process conditionally to the observations. In this context, one of my contributions was to propose the expectation-maximization (EM) algorithm in the case of pairwise Markov chains. I also studied with W. Pieczynski a special case of partially pairwise Markov chain applied to the segmentation of Gaussian processes with long correlation [C3]. Experiments on synthetic data gave significant improvements compared to conventional models where the noise is a long correlated one while giving similar performance when the noise was independent. Nevertheless, the proposed method of parameters estimation was only valid for the centered case, which prevented us from testing the model on real images. So we continued our work with J. Lapuyade to refine our methods and make possible the unsupervised segmentation of Gaussian processes which are not necessarily centered [A2].

April 27, 2011, at 08:23 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 5-31 from:

First, I present my PhD studies on unsupervised segmentation of signals based on triplet Markov models. Then, I present my work on speech processing, performed in the analysis and synthesis team at IRCAM.

to:

First, I present my PhD studies on unsupervised segmentation of signals based on triplet Markov models. Then, I present my work on speech processing, performed in the analysis and synthesis team at IRCAM.

Triplet Markov Chains and unsupervised signal segmentation

Pairwise Markov Models and pairwise partially Markov models

Triplet Markov models

Data fusion

Speech processing

Automatic Speech segmentation into phones

HMM based speech synthesis

Voice conversion

References

Refereed Journal Publications

in french

Conference Proceedings

in french

April 27, 2011, at 08:19 PM by 2001:660:3004:21:2e0:81ff:fe57:ac36 -
Changed lines 3-73 from:

Mon domaine de recherche est le traitement statistique du signal. Mes thématiques principales de recherche sont la modélisation statistique de signaux, le traitement de la parole et leurs applications à la musique. Mes recherches ont porté dans un premier temps sur la généralisation des modèles statistiques de signaux, en particulier les modèles de Markov cachés. J'ai ainsi proposé et étudié durant ma thèse des modèles appelés modèles de Markov triplet qui généralisent les modèles de Markov cachés classiques (Hidden Markov Models (HMM)), dans le cadre applicatif de la segmentation d'images. J'ai par la suite orienté mes recherches vers le traitement de la parole, en travaillant sur la segmentation de signaux de parole en phones, la transformation d'identité de voix, les modèles de langage et la synthèse par HMM. Mes recherches sur la parole sont à caractère interdisciplinaire car elles combinent la modélisation statistique des signaux, le traitement automatique des langues et leur application à la musique.

Je présente dans une premier temps mes travaux de thèse portant sur la segmentation non-supervisée de signaux fondée sur la modélisation par chaînes de Markov triplet. Dans un deuxième temps, je présente mes travaux relatifs au traitement de la parole, effectués dans l'équipe Analyse-synthèse de l'Ircam en tant que chargé de recherche et de développement.

Chaînes de Markov triplet et segmentation non-supervisée de signaux

Durant ma thèse [Lan06], mes travaux de recherche ont porté sur les méthodes de segmentation statistiques bayésiennes. Ces méthodes sont fondées sur une modélisation probabiliste du phénomène partiellement observé donnée par la définition de la loi jointe des processus observé et caché. Elles sont bien adaptées à un grand nombre de situations rencontrées en traitement du signal et de l'image car elles proposent des outils puissants et généraux, intéressants aussi bien pour les modélisations que les traitements dans des espaces de grandes dimensions. Elles sont aussi d'une grande souplesse, étant donné que l'optimalité des solutions peut être adaptée à des préoccupations particulières en choisissant la fonction de perte appropriée. Enfin, lorsque les paramètres du modèle sont inconnus, l'utilisation de méthodes d'estimation permet l'automatisation des traitements qui sont alors d'un grand intérêt pratique. Cependant, la loi jointe des processus caché et observé doit être définie avec précaution. En effet, l'utilisation des estimateurs bayésiens, qui permettent d'estimer une réalisation du processus caché à partir des observations, nécessitent le calcul de probabilités a posteriori. Etant donné la taille des espaces de configurations considérés, il est en général impossible de calculer directement ces probabilités sans émettre d'hypothèses simplificatrices concernant la loi jointe. Il est donc nécessaire de définir des lois jointes telles que le calcul des probabilités a posteriori soit possible, tout en conservant des modèles assez riches pour pouvoir modéliser un grand nombre de situations et de comportements. Les modèles de Markov cachés répondent, la plupart du temps, à ces demandes de par leur capacité à modéliser l'information contextuelle avec parcimonie. Toutefois, ils peuvent se révéler inadaptés à certaines applications. En particulier, leurs représentants les plus simples et les plus couramment utilisés ne permettent pas de prendre en compte les corrélations, conditionnellement aux états cachés, entre observations. C'est dans ce cadre que j'ai étudié des modélisations de la loi jointe, de généralités croissantes appelés modèles de Markov couples et triplets proposés initialement par W. Pieczynski [Pie02, Pie03].

Modèles de Markov couples et couples partiellement de Markov

Le principe d'une chaîne de Markov couple et de supposer que la loi jointe des processus "caché" et "observé" est celle d'une chaîne de Markov. Dans le cas d'une chaîne couple partiellement de Markov, on suppose uniquement la markovianité de la loi du processus caché conditionnellement aux observations. Dans ce cadre, une de mes contributions fut de développer l'algorithme d'estimation espérance-maximisation (EM) dans le cas des chaînes de Markov couples. J'ai également étudié avec W.Pieczynski un cas particulier de chaîne couple partiellement de Markov permettant la segmentation de processus gaussiens à corrélation longue [Pie05]. Les expérimentations sur données synthétiques furent très encourageantes et laissent percevoir de nettes améliorations par rapport aux modèles classiques lorsque les bruits sont effectivement à corrélation longue. Néanmoins, la méthode de l'estimation des paramètres proposée n'était valable que pour le cas centré, ce qui nous a empêché de tester le modèle sur des images réelles. Nous avons donc poursuivi nos travaux en collaboration avec J.Lapuyade afin d'affiner notre méthode et de rendre possible la segmentation non supervisée de processus gaussiens dont les moyennes ne sont pas nécessairement nulles [Lan08].

Modèle de Markov triplet

Les modèles de Markov couples peuvent être étendus aux modèles de Markov triplets [Pie02]. Le principe est d'ajouter un, voire plusieurs, processus auxiliaire(s) tel que la loi jointe du triplet « processus caché, processus auxiliaire, processus observé » soit celle d'une chaîne de Markov. Ces modèles très généraux permettent, entre autre, de palier une autre limitation des modèles classiques qui consiste à supposer la loi jointe stationnaire. En effet, en introduisant un processus auxiliaire contrôlant les changements de matrices de transition du processus, nous avons pu montrer l'efficacité d'un tel modèle dans les situations où la loi jointe du processus caché et des observations n'est pas stationnaire [Lan04] et nous avons proposé des algorithmes d'estimation des paramètres d'une chaîne de Markov triplet. Ce modèle a été appliqué à la segmentation d'images synthétiques et réelles. Une première constatation est que ce modèle permet effectivement la prise en compte de régimes différents, ce qui se traduit par une amélioration de la qualité de segmentation dans le cas d'images possédant à la fois des zones homogènes étendues et des zones possédant des détails fins. Une deuxième constatation est qu'il est également possible d'obtenir une réalisation du processus auxiliaire par l'estimateur du maximum de la marginale a posteriori. Ce type de représentation peut être très utile, notamment en segmentation de textures qui peuvent précisément être modélisées par le processus auxiliaire.

Fusion de données

Nous avons également étudié avec W. Pieczynski, dans le cadre de l'étude des modèles triplets, les possibilités d'extension des modèles probabilistes classiques à un modèle « évidentiel », avec la loi a posteriori du processus caché donnée par la fusion de Dempster-Shafer [Lan05, Lan05b]. Nous avons alors appliqué ce modèle évidentiel à la segmentation de processus non stationnaires. L'intérêt principal de notre approche était de montrer que, bien que la fusion de Dempster-Shafer détruise la markovianité dans le contexte de la chaîne cachée évidentielle, la segmentation bayésienne reste possible via les chaînes de Markov triplets.

Une dernière de mes contributions, en collaboration avec F. Salzenstein, fut d'étendre les modèles de chaînes floues et de champs flous proposés auparavant aux modèles d'arbres flous. La segmentation floue fut proposée afin de tenir compte de l'imprécision portant sur l'appartenance d'un site à une région thématique. Ainsi, dans un signal « flou » cohabitent des zones homogènes avec des zones floues représentant des sites intermédiaires pouvant appartenir à plusieurs classes dures. L'originalité de ces modèles est caractérisée par le fait que leur distribution comprend une composante discrète et une composante continue, la composante discrète étant constituée par les masses de Dirac et représentant le poids affecté à chaque classe dure et la composante continue correspondant aux sites flous (mesure de Lebesgue). Nous avons ainsi proposé un modèle d'arbre de Markov flou caché multicapteur que nous avons appliqué à la segmentation d'images astronomiques [Lan05c].

Traitement de la parole

Segmentation de signaux de parole en phones

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeurs notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

Conversion de voix

La conversion de voix ou transformation d'identité de voix consiste à transformer le signal de la voix d'un locuteur de référence dit locuteur source, de telle façon qu'il semble, à l'écoute, avoir été prononcé par un autre locuteur identifié au préalable, dit locuteur cible. Les techniques de conversion explorées à l'Ircam par F. Villavicencio puis par moi-même sont fondées sur des mélanges de gaussiennes. Les travaux ont porté tant sur la fonction de transformation que sur son application afin d'améliorer la qualité de la parole convertie. Ainsi, la modélisation "all-pole" de l'enveloppe spectrale a été améliorée par la technique "True-Envelope" qui favorise la qualité de la synthèse et aide la caractérisation du résiduel par rapport au locuteur [Vil07a], [Vil08a]. Les résultats de transformation d'identité obtenus sont très encourageants. Ainsi, il apparaît que la "personnalité" du locuteur cible est bien reproduite après transformation et que celle du locuteur source a largement disparu. La principale difficulté qui demeure est une certaine dégradation de la qualité acoustique de la voix, une certain "grain" ou bruit demeure. Des améliorations récentes que j'ai apporté comme la prise en compte des caractéristiques dynamiques du timbre [Tod05] ont permis de diminuer sensiblement cette dégradation. Néanmoins d'autres voies d'améliorations que nous étudions actuellement devraient permettre d'aboutir à une qualité utilisable, même dans des applications très exigeantes, comme les applications artistiques

Synthèse de parole par modèle paramétrique

Le principe de la synthèse HMM, développée par l'équipe HTS (hts.sp.nitech.ac.jp), et que j'ai adaptée pour des voix en français est la modélisation jointe du spectre (conduit vocal), de la fréquence fondamentale (source) et des durées pour chaque phonème en contexte par une HMM. Lors de la synthèse, un macro-modèle est construit à partir de la concaténation des HMM correspondant aux phonèmes en contexte de la séquence phonétique à synthétiser. Les durées des états sont dans un premier temps générées puis la trajectoire des paramètres spectraux est estimée à partir d'un algorithme spécifique de génération des paramètres spectraux [Tok00] prenant en compte la dépendance entre les paramètres statiques et dynamiques. Un des avantages de cette méthode par rapport à la synthèse de parole par concaténation d'unités est qu'elle ne nécessite que le stockage des paramètres des modèles. Elle permet également un contrôle précis des caractéristiques de la synthèse. Les inconvénients de ce type de synthèse sont les artefacts de la voix synthétisée liés à la modélisation de la source glottique et le manque de clarté lié à la modélisation de l'enveloppe spectrale.

Modèles de langage pour la génération de texte et de signaux de parole

Nous avons pu constater que l'utilisation de modèles de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

Références

  • [Lan06] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non supervisée de Signaux, thèse de l'Institut National de Télécommunications, soutenue le 5 décembre 2006.
  • [Pie02] W. Pieczynski, Chaînes de Markov Triplet, Triplet Markov Chains, Comptes Rendus de l'Académie des Sciences - Mathématique, Série I, Vol. 335, No. 3, pp. 275-278, 2002.
  • [Pie03] W. Pieczynski, Pairwise Markov chains, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 5, pp. 634-639, 2003.
  • [Pie05] W. Pieczynski and P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, Statistical Signal Processing (SSP2005), Bordeaux, France, July 17-20, 2005.
  • [Lan08] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of pairwise Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.
  • [Lan04] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACVIS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.
  • [Lan05] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 1, pp. 15-26, 2005.
  • [Lan05b] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chain using evidential priors, IEEE Trans. on Signal Processing, Vol. 53, No. 8, pp. 3091-3098, 2005.
  • [Lan05c] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.
  • [Lan08b] P. Lanchantin , A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.
  • [Bel09] G.Beller, C. Veaux, G.Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, To Appear.
  • [Vil07a] Villavicencio, F., Röbel, A., Rodet, X., « All-Pole Spectral Envelope Modeling with Order Selection for Harmonic Signals », In Proc. ICASSP' 07, Honolulu, 2007.
  • [Vil08a] Villavicencio, F., Röbel, A., Rodet, X., « Extending efficient spectral envelope modeling to mel-frequency based representation », ICASSP, Las Vegas, 2008
  • [Tod05] T. Toda, A. Black, K. Tokuda. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In Proc. ICASSP '05, Vol. 1, pp. 9-12, Philadelpia USA, 2005.
  • [Tok00] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for hmm-based speech synthesis", ICASSP00, Istanbul, Turkey, 2000.
to:

My research area is statistical signal processing. My main topics of research include statistical modeling of signals, speech processing and their applications to music. My research has focused initially on the generalization of statistical models for signals, especially hidden Markov models. I proposed and studied for my PhD (T) models called Triplet Markov models that generalize the classical hidden Markov models (Hidden Markov Models (HMM)), with applications in image segmentation. I then directed my research toward speech processing, working on the segmentation of speech signals into phones, voice conversion, language models and HMM-based speech synthesis. My research studies on speech are interdisciplinary as they combine statistical modeling of signals, natural language processing and their application to music.

First, I present my PhD studies on unsupervised segmentation of signals based on triplet Markov models. Then, I present my work on speech processing, performed in the analysis and synthesis team at IRCAM.

December 10, 2009, at 03:08 PM by 129.102.96.147 -
Changed lines 29-30 from:

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeur notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

to:

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeurs notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

November 27, 2009, at 03:38 PM by 129.102.21.30 -
Changed lines 41-42 from:

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

to:

Nous avons pu constater que l'utilisation de modèles de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

November 27, 2009, at 03:38 PM by 129.102.21.30 -
Changed lines 41-42 from:

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

to:

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

November 27, 2009, at 03:29 PM by 129.102.21.30 -
Changed lines 41-42 from:

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

to:

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

March 13, 2009, at 03:46 PM by 129.102.21.30 -
Changed lines 23-24 from:

Une dernière de mes contributions, en collaboration avec F. Salzenstein, fut d'étendre les modèles de chaînes floues et de champs flous proposés auparavant aux modèles d'arbres flous. La segmentation floue fut proposée afin de tenir compte de l'imprécision portant sur l'appartenance d'un site à une région thématique. Ainsi, dans un signal « flou » cohabitent des zones homogènes (classes « dures ») avec des zones floues représentant des sites intermédiaires pouvant appartenir à plusieurs classes dures. L'originalité de ces modèles est caractérisée par le fait que leur distribution comprend une composante discrète et une composante continue, la composante discrète étant constituée par les masses de Dirac et représentant le poids affecté à chaque classe dure et la composante continue correspondant aux sites flous (mesure de Lebesgue). Nous avons ainsi proposé un modèle d'arbre de Markov flou caché multicapteur que nous avons appliqué à la segmentation d'images astronomiques [Lan05c].

to:

Une dernière de mes contributions, en collaboration avec F. Salzenstein, fut d'étendre les modèles de chaînes floues et de champs flous proposés auparavant aux modèles d'arbres flous. La segmentation floue fut proposée afin de tenir compte de l'imprécision portant sur l'appartenance d'un site à une région thématique. Ainsi, dans un signal « flou » cohabitent des zones homogènes avec des zones floues représentant des sites intermédiaires pouvant appartenir à plusieurs classes dures. L'originalité de ces modèles est caractérisée par le fait que leur distribution comprend une composante discrète et une composante continue, la composante discrète étant constituée par les masses de Dirac et représentant le poids affecté à chaque classe dure et la composante continue correspondant aux sites flous (mesure de Lebesgue). Nous avons ainsi proposé un modèle d'arbre de Markov flou caché multicapteur que nous avons appliqué à la segmentation d'images astronomiques [Lan05c].

Changed lines 29-30 from:

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeur notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

to:

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeur notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

March 13, 2009, at 03:43 PM by 129.102.21.30 -
Added lines 2-73:

Mon domaine de recherche est le traitement statistique du signal. Mes thématiques principales de recherche sont la modélisation statistique de signaux, le traitement de la parole et leurs applications à la musique. Mes recherches ont porté dans un premier temps sur la généralisation des modèles statistiques de signaux, en particulier les modèles de Markov cachés. J'ai ainsi proposé et étudié durant ma thèse des modèles appelés modèles de Markov triplet qui généralisent les modèles de Markov cachés classiques (Hidden Markov Models (HMM)), dans le cadre applicatif de la segmentation d'images. J'ai par la suite orienté mes recherches vers le traitement de la parole, en travaillant sur la segmentation de signaux de parole en phones, la transformation d'identité de voix, les modèles de langage et la synthèse par HMM. Mes recherches sur la parole sont à caractère interdisciplinaire car elles combinent la modélisation statistique des signaux, le traitement automatique des langues et leur application à la musique.

Je présente dans une premier temps mes travaux de thèse portant sur la segmentation non-supervisée de signaux fondée sur la modélisation par chaînes de Markov triplet. Dans un deuxième temps, je présente mes travaux relatifs au traitement de la parole, effectués dans l'équipe Analyse-synthèse de l'Ircam en tant que chargé de recherche et de développement.

Chaînes de Markov triplet et segmentation non-supervisée de signaux

Durant ma thèse [Lan06], mes travaux de recherche ont porté sur les méthodes de segmentation statistiques bayésiennes. Ces méthodes sont fondées sur une modélisation probabiliste du phénomène partiellement observé donnée par la définition de la loi jointe des processus observé et caché. Elles sont bien adaptées à un grand nombre de situations rencontrées en traitement du signal et de l'image car elles proposent des outils puissants et généraux, intéressants aussi bien pour les modélisations que les traitements dans des espaces de grandes dimensions. Elles sont aussi d'une grande souplesse, étant donné que l'optimalité des solutions peut être adaptée à des préoccupations particulières en choisissant la fonction de perte appropriée. Enfin, lorsque les paramètres du modèle sont inconnus, l'utilisation de méthodes d'estimation permet l'automatisation des traitements qui sont alors d'un grand intérêt pratique. Cependant, la loi jointe des processus caché et observé doit être définie avec précaution. En effet, l'utilisation des estimateurs bayésiens, qui permettent d'estimer une réalisation du processus caché à partir des observations, nécessitent le calcul de probabilités a posteriori. Etant donné la taille des espaces de configurations considérés, il est en général impossible de calculer directement ces probabilités sans émettre d'hypothèses simplificatrices concernant la loi jointe. Il est donc nécessaire de définir des lois jointes telles que le calcul des probabilités a posteriori soit possible, tout en conservant des modèles assez riches pour pouvoir modéliser un grand nombre de situations et de comportements. Les modèles de Markov cachés répondent, la plupart du temps, à ces demandes de par leur capacité à modéliser l'information contextuelle avec parcimonie. Toutefois, ils peuvent se révéler inadaptés à certaines applications. En particulier, leurs représentants les plus simples et les plus couramment utilisés ne permettent pas de prendre en compte les corrélations, conditionnellement aux états cachés, entre observations. C'est dans ce cadre que j'ai étudié des modélisations de la loi jointe, de généralités croissantes appelés modèles de Markov couples et triplets proposés initialement par W. Pieczynski [Pie02, Pie03].

Modèles de Markov couples et couples partiellement de Markov

Le principe d'une chaîne de Markov couple et de supposer que la loi jointe des processus "caché" et "observé" est celle d'une chaîne de Markov. Dans le cas d'une chaîne couple partiellement de Markov, on suppose uniquement la markovianité de la loi du processus caché conditionnellement aux observations. Dans ce cadre, une de mes contributions fut de développer l'algorithme d'estimation espérance-maximisation (EM) dans le cas des chaînes de Markov couples. J'ai également étudié avec W.Pieczynski un cas particulier de chaîne couple partiellement de Markov permettant la segmentation de processus gaussiens à corrélation longue [Pie05]. Les expérimentations sur données synthétiques furent très encourageantes et laissent percevoir de nettes améliorations par rapport aux modèles classiques lorsque les bruits sont effectivement à corrélation longue. Néanmoins, la méthode de l'estimation des paramètres proposée n'était valable que pour le cas centré, ce qui nous a empêché de tester le modèle sur des images réelles. Nous avons donc poursuivi nos travaux en collaboration avec J.Lapuyade afin d'affiner notre méthode et de rendre possible la segmentation non supervisée de processus gaussiens dont les moyennes ne sont pas nécessairement nulles [Lan08].

Modèle de Markov triplet

Les modèles de Markov couples peuvent être étendus aux modèles de Markov triplets [Pie02]. Le principe est d'ajouter un, voire plusieurs, processus auxiliaire(s) tel que la loi jointe du triplet « processus caché, processus auxiliaire, processus observé » soit celle d'une chaîne de Markov. Ces modèles très généraux permettent, entre autre, de palier une autre limitation des modèles classiques qui consiste à supposer la loi jointe stationnaire. En effet, en introduisant un processus auxiliaire contrôlant les changements de matrices de transition du processus, nous avons pu montrer l'efficacité d'un tel modèle dans les situations où la loi jointe du processus caché et des observations n'est pas stationnaire [Lan04] et nous avons proposé des algorithmes d'estimation des paramètres d'une chaîne de Markov triplet. Ce modèle a été appliqué à la segmentation d'images synthétiques et réelles. Une première constatation est que ce modèle permet effectivement la prise en compte de régimes différents, ce qui se traduit par une amélioration de la qualité de segmentation dans le cas d'images possédant à la fois des zones homogènes étendues et des zones possédant des détails fins. Une deuxième constatation est qu'il est également possible d'obtenir une réalisation du processus auxiliaire par l'estimateur du maximum de la marginale a posteriori. Ce type de représentation peut être très utile, notamment en segmentation de textures qui peuvent précisément être modélisées par le processus auxiliaire.

Fusion de données

Nous avons également étudié avec W. Pieczynski, dans le cadre de l'étude des modèles triplets, les possibilités d'extension des modèles probabilistes classiques à un modèle « évidentiel », avec la loi a posteriori du processus caché donnée par la fusion de Dempster-Shafer [Lan05, Lan05b]. Nous avons alors appliqué ce modèle évidentiel à la segmentation de processus non stationnaires. L'intérêt principal de notre approche était de montrer que, bien que la fusion de Dempster-Shafer détruise la markovianité dans le contexte de la chaîne cachée évidentielle, la segmentation bayésienne reste possible via les chaînes de Markov triplets.

Une dernière de mes contributions, en collaboration avec F. Salzenstein, fut d'étendre les modèles de chaînes floues et de champs flous proposés auparavant aux modèles d'arbres flous. La segmentation floue fut proposée afin de tenir compte de l'imprécision portant sur l'appartenance d'un site à une région thématique. Ainsi, dans un signal « flou » cohabitent des zones homogènes (classes « dures ») avec des zones floues représentant des sites intermédiaires pouvant appartenir à plusieurs classes dures. L'originalité de ces modèles est caractérisée par le fait que leur distribution comprend une composante discrète et une composante continue, la composante discrète étant constituée par les masses de Dirac et représentant le poids affecté à chaque classe dure et la composante continue correspondant aux sites flous (mesure de Lebesgue). Nous avons ainsi proposé un modèle d'arbre de Markov flou caché multicapteur que nous avons appliqué à la segmentation d'images astronomiques [Lan05c].

Traitement de la parole

Segmentation de signaux de parole en phones

Dans le cadre du projet ANR VIVOS, j'ai proposé et développé le logiciel ircamAlign [Lan08b] . Il s'agit d'un système de segmentation de signaux de parole en phones utilisant la bibliothèque HTK. Le système est fondé sur la modélisation par chaînes de Markov cachées utilisée notamment en reconnaissance de parole. Cette modélisation spécifique au traitement de la parole est en fait un cas particulier de chaîne de Markov triplet T=(U,X,Y) dans laquelle U correspond au modèle de langage, X est le processus d'évolution des caractéristiques spectrales au cours du temps (sous-états des modèles HMM de chaque phonème) et Y est le processus des observations (les coefficients cepstraux). Un ensemble de modèles multilocuteurs français a été appris à partir du corpus BREF80. La segmentation peut être faite avec ou sans texte. Dans le cas où le texte est disponible, la loi du processus U est celle d'une chaîne de Markov dont la topologie est un graphe de prononciations multiples construit à partir de la phonétisation du texte. De nombreuses options sont disponibles pour la création de ce graphe. Il est ainsi possible d'autoriser l'omission ou la répétition de mots, l'insertion de pauses courtes ou de sons paraverbaux comme les respirations ou les bruit de bouches pour lesquels des modèles spécifiques ont été appris. Dans le cas où le texte n'est pas disponible, comme par exemple dans le cas d'un signal de parole spontanée, U est alors un bigram ou un tri-gram appris sur un ensemble de texte français choisis. D'autre part, un indice de confiance fondé sur les probabilités a posteriori est calculé pour chaque phonème afin de faciliter une correction manuelle éventuelle. A partir de cette segmentation en phonème, la structure de la parole (syllabes, mots, groupes de souffle) peut être extraite du signal de parole afin de constituer des bases de données d'unités permettant la mise en place d'une synthèse Text-To-Speech (TTS) par concaténation. Ainsi ircamAlign est utilisé par ircamTTS et également ircamCorpusTools [Bel09] qui est un système de gestion de base données d'unités de parole. ircamAlign est également utilisé dans le projet ANR Rhapsodie pour la constitution de corpus. Enfin, ircamAlign a été utilisé par des compositeur notamment dans com que voz de Stephano Gervasoni. Notons qu'une version temps réel a par la suite été développée par J.Bloit et inclue dans MaxMSP.

Conversion de voix

La conversion de voix ou transformation d'identité de voix consiste à transformer le signal de la voix d'un locuteur de référence dit locuteur source, de telle façon qu'il semble, à l'écoute, avoir été prononcé par un autre locuteur identifié au préalable, dit locuteur cible. Les techniques de conversion explorées à l'Ircam par F. Villavicencio puis par moi-même sont fondées sur des mélanges de gaussiennes. Les travaux ont porté tant sur la fonction de transformation que sur son application afin d'améliorer la qualité de la parole convertie. Ainsi, la modélisation "all-pole" de l'enveloppe spectrale a été améliorée par la technique "True-Envelope" qui favorise la qualité de la synthèse et aide la caractérisation du résiduel par rapport au locuteur [Vil07a], [Vil08a]. Les résultats de transformation d'identité obtenus sont très encourageants. Ainsi, il apparaît que la "personnalité" du locuteur cible est bien reproduite après transformation et que celle du locuteur source a largement disparu. La principale difficulté qui demeure est une certaine dégradation de la qualité acoustique de la voix, une certain "grain" ou bruit demeure. Des améliorations récentes que j'ai apporté comme la prise en compte des caractéristiques dynamiques du timbre [Tod05] ont permis de diminuer sensiblement cette dégradation. Néanmoins d'autres voies d'améliorations que nous étudions actuellement devraient permettre d'aboutir à une qualité utilisable, même dans des applications très exigeantes, comme les applications artistiques

Synthèse de parole par modèle paramétrique

Le principe de la synthèse HMM, développée par l'équipe HTS (hts.sp.nitech.ac.jp), et que j'ai adaptée pour des voix en français est la modélisation jointe du spectre (conduit vocal), de la fréquence fondamentale (source) et des durées pour chaque phonème en contexte par une HMM. Lors de la synthèse, un macro-modèle est construit à partir de la concaténation des HMM correspondant aux phonèmes en contexte de la séquence phonétique à synthétiser. Les durées des états sont dans un premier temps générées puis la trajectoire des paramètres spectraux est estimée à partir d'un algorithme spécifique de génération des paramètres spectraux [Tok00] prenant en compte la dépendance entre les paramètres statiques et dynamiques. Un des avantages de cette méthode par rapport à la synthèse de parole par concaténation d'unités est qu'elle ne nécessite que le stockage des paramètres des modèles. Elle permet également un contrôle précis des caractéristiques de la synthèse. Les inconvénients de ce type de synthèse sont les artefacts de la voix synthétisée liés à la modélisation de la source glottique et le manque de clarté lié à la modélisation de l'enveloppe spectrale.

Modèles de langage pour la génération de texte et de signaux de parole

Nous avons pu constater que l'utilisation de modèle de langage très simples de type N-gram permet d'obtenir des résultats intéressants d'un point de vue compositionnel. Ainsi, l'utilisation d'un modèle de N-gram à deux niveaux (mots et des étiquettes grammaticales), dans lequel l'ordre du N-gram peut être différent pour chaque niveau, permet de générer de nouvelles phrases ayant des caractéristiques similaires à celles contenues dans le corpus d'apprentissage. D'autre part, l'utilisation d'un N-gram appris sur les séquence phonétiques syllabifiées du corpus d'apprentissage permet la génération de phrases pour lesquels il est possible de choisir les rimes finales et le nombre de syllabes. On peut ainsi générer du matériau textuel ou sonore respectant certains critéres (syntaxe, vocabulaire, sonorité...) dans lequel le compositeur peut opérer un sélection.

Références

  • [Lan06] P. Lanchantin, Chaînes de Markov Triplets et Segmentation Non supervisée de Signaux, thèse de l'Institut National de Télécommunications, soutenue le 5 décembre 2006.
  • [Pie02] W. Pieczynski, Chaînes de Markov Triplet, Triplet Markov Chains, Comptes Rendus de l'Académie des Sciences - Mathématique, Série I, Vol. 335, No. 3, pp. 275-278, 2002.
  • [Pie03] W. Pieczynski, Pairwise Markov chains, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 25, No. 5, pp. 634-639, 2003.
  • [Pie05] W. Pieczynski and P. Lanchantin, Restoring hidden non stationary process using triplet partially Markov chain with long memory noise, Statistical Signal Processing (SSP2005), Bordeaux, France, July 17-20, 2005.
  • [Lan08] P. Lanchantin, J. Lapuyade-Lahorgue and W. Pieczynski, Unsupervised segmentation of pairwise Markov chains hidden with long memory noise, Signal Processing, No. 88, Vol. 5, pp 1134-1151, May 2008.
  • [Lan04] P. Lanchantin and W. Pieczynski, Unsupervised non stationary image segmentation using triplet Markov chains, Advanced Concepts for Intelligent Vision Systems (ACVIS 04), Aug. 31-Sept. 3, Brussels, Belgium, 2004.
  • [Lan05] P. Lanchantin et W. Pieczynski, Chaînes et arbres de Markov évidentiels avec applications à la segmentation des processus non stationnaires, Traitement du Signal, Vol. 22, No. 1, pp. 15-26, 2005.
  • [Lan05b] P. Lanchantin and W. Pieczynski, Unsupervised restoration of hidden non stationary Markov chain using evidential priors, IEEE Trans. on Signal Processing, Vol. 53, No. 8, pp. 3091-3098, 2005.
  • [Lan05c] P. Lanchantin, F. Salzenstein, Segmentation d'Images Multispectrales par Arbre de Markov caché Flou, Actes du Colloque GRETSI'05, 6-9 septembre, Louvain-la-Neuve, Belgique, 2005.
  • [Lan08b] P. Lanchantin , A. C. Morris, X. Rodet, C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, in E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008.
  • [Bel09] G.Beller, C. Veaux, G.Degottex, N. Obin, P. Lanchantin et X. Rodet, IrcamCorpusTools : Plateforme Pour Les Corpus de Parole, Traitement Automatique des Langues, To Appear.
  • [Vil07a] Villavicencio, F., Röbel, A., Rodet, X., « All-Pole Spectral Envelope Modeling with Order Selection for Harmonic Signals », In Proc. ICASSP' 07, Honolulu, 2007.
  • [Vil08a] Villavicencio, F., Röbel, A., Rodet, X., « Extending efficient spectral envelope modeling to mel-frequency based representation », ICASSP, Las Vegas, 2008
  • [Tod05] T. Toda, A. Black, K. Tokuda. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In Proc. ICASSP '05, Vol. 1, pp. 9-12, Philadelpia USA, 2005.
  • [Tok00] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for hmm-based speech synthesis", ICASSP00, Istanbul, Turkey, 2000.
December 18, 2008, at 06:25 PM by 129.102.21.30 -
Deleted line 0:

recherche

December 18, 2008, at 06:25 PM by 129.102.21.30 -
Deleted lines 2-7:




peine_c3
six
alexandrins\\

December 12, 2008, at 03:14 PM by 129.102.21.30 -
Changed lines 3-5 from:

peine1
peine2
peine3\\

to:



\\

December 12, 2008, at 03:09 PM by 129.102.21.30 -
Changed lines 4-10 from:

Tremblay
Xavier
Xavier
Xavier
Xavier
Xavier
Xavier\\

to:

peine2
peine3
peine_c3
six
alexandrins\\

December 12, 2008, at 03:08 PM by 129.102.21.30 -
Added lines 3-4:

peine1
Tremblay\\

Deleted line 5:

Tremblay\\

Deleted line 10:

Xavier\\

December 12, 2008, at 03:05 PM by 129.102.21.30 -
Added line 4:

Tremblay\\

Deleted line 10:

Xavier\\

December 12, 2008, at 02:59 PM by 129.102.21.30 -
Changed lines 3-10 from:

Xavier Xavier Xavier Xavier Xavier Xavier Xavier Xavier

to:

Xavier
Xavier
Xavier
Xavier
Xavier
Xavier
Xavier
Xavier\\

December 12, 2008, at 02:58 PM by 129.102.21.30 -
Added lines 4-10:

Xavier Xavier Xavier Xavier Xavier Xavier Xavier

December 12, 2008, at 02:58 PM by 129.102.21.30 -
Changed line 3 from:
to:

Xavier

December 12, 2008, at 02:58 PM by 129.102.21.30 -
Changed line 3 from:

http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/Xavier.mp3

to:
December 12, 2008, at 02:57 PM by 129.102.21.30 -
Changed lines 2-3 from:

http://accel10.mettre-put-idata.over-blog.com/0/02/59/18/2008-06/2008-06-13-Happy-End.jpg

to:

http://accel10.mettre-put-idata.over-blog.com/0/02/59/18/2008-06/2008-06-13-Happy-End.jpg http://recherche.ircam.fr/equipes/analyse-synthese/lanchant/uploads/Main/Xavier.mp3

October 18, 2008, at 06:20 AM by Pierre -
Added line 2:

http://accel10.mettre-put-idata.over-blog.com/0/02/59/18/2008-06/2008-06-13-Happy-End.jpg

October 18, 2008, at 03:58 AM by Pierre -
Added line 1:

recherche