HMM-based Speech Segmentation

ircamAlign is a tool for speech segmentation useful to create database for speech synthesis.
it is based on the HTK toolbox and LIAPHON french phonetizer
available for French and English
audio speech file and its textual transcription are taken as input
linguistic structure is extracted from the text and aligned on the audio file by considering multi-pronunciation graph to model the dependencies between phonemes.
if the text transcription is no available, a bi-gram language model is used
phoneme are modelized by left-right HMM with 7 states.
Confidence measure are computed at different linguistic level for easier manual correction
HTS lab features format are directly created to allow the quick creation of new voices.
Automatic Phoneme Segmentation With Relaxed Textual Constraints,
P. Lanchantin, A. C. Morris X. Rodet and C. Veaux,
LREC'08 Proceedings, Marrakech, Marocco, 2008.

Hypermusic