HMM-based Speech Segmentation
ircamAlign
- ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
- it is based on the HTK toolbox and LIAPHON french phonetizer
- available for French and English
- audio speech file and its textual transcription are taken as input
- linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
- if the text transcription is no available, a bi-gram language model is used
- phoneme are modelized by left-right HMM with 7 states.
- Confidence measure are computed at different linguistic level for easier manual correction
- HTS lab features format are directly created to allow the quick creation of new voices.
- Automatic Phoneme Segmentation With Relaxed Textual Constraints,
P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
LREC'08 Proceedings, Marrakech, Marocco, 2008.
Musical productions using ircamAlign
- ircamAlign is used by composers and it has been used in several musical creations at IRCAM such as:
- Com que voz, Stefano Gervasoni, Thomas Goepfer
- HyperMusic: Prologue, Hector Parra, Thomas Goepfer
- Häxan, la sorcellerie à travers les âges, Mauro Lanza, Olivier Pasquet
- Cantate égale pays, Gérard Pesson, Sébastien Roux
- Le père, Michael Jarrel, Serge Lemouton