Next: Time/frequency resolution in feature Up: Hearing Science and Speech Previous: Hearing Science and Speech

Introduction

It is often said that the key to success of Speech Recognition is to apply knowledge about how Humans process and understand speech. However, past attempts to apply "auditory models" to SR have met with mixed success. Apart from the mel scale implicit in features such as the mel cepstrum, mainstream recognition systems owe little to auditory models. As usual, the "hype" that surrounds certain well-publicized attempts seems to have generated a mixture of short-term interest and long-term mistrust. <p> Here I present a few ideas about how to go beyond the hype and bring some real benefit to speech recognition. The ideas are few and modest, and the benefit is likely to be either small or else restricted to subproblems, but the ideas should work without too much hassle. <p> Rather than trying to incorporate a detailed auditory model in a recognition system, for example as a "front-end" or feature extractor, the idea is to take inspiration from our understanding of how the auditory system solves certain problems, and why it chooses to solve them in that particular way. This may lead to engineering solutions that don't seem "auditory" at all! <p> Of course, this does not mean that sophisticated and realistic auditory models cannot be of use to SR systems. Progress with auditory models should lead to more effective processing, and progress in computer power may make schemes that were uneffective yesterday effective tomorrow.

Next: Time/frequency resolution in feature Up: Hearing Science and Speech Previous: Hearing Science and Speech

Alain de Cheveigne
1998-02-16