Next: Time/frequency resolution in feature
Up: Hearing Science and Speech
Previous: Hearing Science and Speech
It is often said that the key to success of Speech Recognition is to apply
knowledge about how Humans process and understand speech. However,
past attempts to apply "auditory models" to SR have met with mixed success.
Apart from the mel scale implicit in features such as the mel
cepstrum, mainstream recognition systems owe little to auditory models.
As usual, the "hype" that surrounds certain well-publicized attempts
seems to have
generated a mixture of short-term interest and long-term mistrust.
<p>
Here I present a few ideas about how to go beyond
the hype and bring some real benefit to speech recognition. The ideas are
few and modest, and the benefit is likely to be either small or else
restricted to subproblems,
but the ideas should work without too much hassle.
<p>
Rather than trying to incorporate a detailed auditory model in a
recognition system, for example
as a "front-end" or feature
extractor, the idea is to take inspiration from our understanding of
how the auditory system solves certain problems, and why
it chooses to solve them in that particular way. This may lead to
engineering solutions that don't seem "auditory" at all!
<p>
Of course, this does not mean that sophisticated and
realistic auditory models cannot be of use to SR systems. Progress with
auditory models should lead to more effective processing, and progress in
computer power may make schemes that were uneffective yesterday effective
tomorrow.
Next: Time/frequency resolution in feature
Up: Hearing Science and Speech
Previous: Hearing Science and Speech
Alain de Cheveigne
1998-02-16