Dynamic Model Selection for spectral Voice Conversion

  • Statistical methods for voice conversion are usually based on a single model selected in order to represent a tradeoff between goodness of fit and complexity.
  • In this work we assumed that the best model may change over time, depending on the source acoustic features.
  • We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity - including mixture of Gaussian and probabilistic principal component analyzers - are considered during the conversion of a source speech signal into a target speech signal.
  • This set is built during the learning phase, according to the Bayes information criterion. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each frame.
  • Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.
  • Dynamic Model Selection for Spectral Voice Conversion,
    P. Lanchantin and X. Rodet,
    ''Interspeech 2010 Proceedings, Makuhari, Japan, Sept 2010.

VC from real source voice

  • Target Reference Samples
# Fernando Tremblay Gilles Thomas Cocteau
004
  • Converted source envelope
# Xavier Fernando Tremblay Gilles Thomas Cocteau
002
003
004
005
006
007
008
009
070
082
088

VC from a commercial TTS source voice

  • Target Reference Samples
# BO
012
013

Warning: many artefacts are due to artefacts already present in the speech generated by the TTS system

  • Converted source envelope
# Ryan BO
001
003
011
049
071
078
082
171