de Cheveigné, A., and Kawahara, H. (1999). "Missing-data model of vowel identification," JASA (accepted for publication)

Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However in voiced speech the transfer function is {\em sampled}\ at multiples of the fundamental frequency (\fo), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low frequency region and their shape varies strongly with \fo. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause \fo-dependent distortion. The problem is severe at high \fo s where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with {\em missing data}. Matching is restricted to available data, and missing data are ignored using an \fo-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative \fo-independency observed in vowel identification.