HMM-based Speech Segmentation

ircamAlign

  • ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
  • it is based on the HTK toolbox and LIAPHON french phonetizer
  • available for French and English
  • audio speech file and its textual transcription are taken as input
  • linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
  • if the text transcription is no available, a bi-gram language model is used
  • phoneme are modelized by left-right HMM with 7 states.
  • Confidence measure are computed at different linguistic level for easier manual correction
  • HTS lab features format are directly created to allow the quick creation of new voices.
  • Automatic Phoneme Segmentation With Relaxed Textual Constraints,
    P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
    LREC'08 Proceedings, Marrakech, Marocco, 2008.

Musical productions using ircamAlign

  • ircamAlign is used by composers and it has been used in several musical creations at IRCAM such as:
    • Com que voz, Stefano Gervasoni, Thomas Goepfer
    • HyperMusic: Prologue, Hector Parra, Thomas Goepfer
    • Häxan, la sorcellerie à travers les âges, Mauro Lanza, Olivier Pasquet
    • Cantate égale pays, Gérard Pesson, Sébastien Roux
    • Le père, Michael Jarrel, Serge Lemouton
Hypermusic