Abstract:
Subjects identified concurrent synthetic vowel pairs that differed in relative level and fundamental frequency difference (DF0). Subjects were allowed to report one or two vowels for each stimulus, instead of being forced to report two vowels. At all levels, identification was better at a DF0 of 6% than at unison, but the effect was larger if the target vowel level was below that of the competing vowel. The existence of a DF0 effect when the target was at -10 or -20 dB relative to the competing vowel was interpreted as evidence that segregation occurs according to harmonic cancellation rather than harmonic enhancement. The pattern of identification as a function of level and vowel pair was found to be incompatible with several models of vowel segregation.
Abstract:
Subjects identified concurrent synthetic vowel pairs in four experiments. The first experiment found that improvements in vowel identification with a difference in fundamental frequency (DF0) do not depend on component phase. The second investigated more precisely whether the phase patterns that occurred with the use of inharmonic stimuli in a previously reported experiment [de Cheveigné et al., J. Acoust. Soc. Am. 97, 3736-3748 (1995)] can by themselves produce effects similar to those attributed to harmonicity. No such effects were found. The third experiment replicated several conditions of that harmonicity experiment and found, as previously, that identification was better for harmonic than for inharmonic backgrounds. However target harmonicity had no effect on identification, contrary to previous results. The first three experiments employed a new task in which subjects were free to report one or two vowels for each stimulus. The fourth experiment reproduced several conditions with a more classic task in which subjects had to report two vowels. Compared to the classic task, the new task gave larger effects and provided an additional measure of segregation: the number of vowels reported. Overall, results were consistent with the hypothesis that the auditory system segregates targets by a mechanism of harmonic cancellation of competing vowels. They did not support the hypothesis of harmonic enhancement of targets. The lack of phase effect puts strong constraints on models that exploit pitch period asynchrony (PPA) or beats.
This paper presents a "neural cancellation filter" capable of segregating weak targets from competing harmonic backgrounds, and a model of concurrent vowel segregation based on this filter. The elementary cancellation filter comprises a delay line and an inhibitory synapse. Every peripheral channel is processed by a similar filter tuned to the period of the competing sound, to suppress its correlates within the neural discharge pattern. Combined with a pattern matching model based on autocorrelation functions summed over all channels, the filter is used to form a model of concurrent vowel identification. The model predicts both the number of vowels reported for each stimulus (when subjects are allowed to report one or two), and the identification rate. It belongs to the class of "harmonic cancellation" models that are supported by experimental evidence that vowels mixed with competing sounds are better identified when the competing sounds are harmonic. It successfully explains the improvement of identification with DF0 observed in conditions where the target vowel level was low (-20 dB) relative to the competing vowel. Two alternative schemes using the same filter are also considered. One derives a "place" representation from the magnitude of the filter output. The other uses the ratio of filter input/output to select channels.
The improvement of identification accuracy of concurrent vowels with differences in fundamental frequency (F0) is usually attributed to mechanisms that exploit harmonic structure. To decide whether identification is aided primarily by selecting the target vowel on the basis of its harmonic structure ("harmonic enhancement") or removing the interfering vowel on the basis of its harmonic structure ("harmonic cancellation"), pairs of synthetic vowels, each of which was either harmonic or inharmonic, were presented to listeners for identification. Responses for each vowel were scored according to the vowel's harmonicity and that of the vowel that accompanied it. For a given target, identification was better by about 3% for a harmonic ground unless the target was also harmonic with the same F0. This supports the cancellation hypothesis. Identification was worse for harmonic than for inharmonic targets by 3-8%. This does not support the enhancement hypothesis. When both vowels were harmonic, identification was better by about 6% when the F0s differed by 1/2 semitone, consistent with previous experiments. Results are interpreted in terms of harmonic enhancement and harmonic cancellation, and alternative explanations such as waveform interaction are considered.
Signal-processing methods and auditory models for separation of concurrent harmonic sounds are reviewed, and a processing principle is proposed that cancels harmonic interference in the time-domain. The principle is first formulated in signal processing terms as a time-domain comb-filter. The critical issue of fundamental frequency estimation is investigated and an algorithm is proposed. Tested on a restricted database of natural voiced speech, the algorithm successfully found estimates correct within 3% of an octave for 90% of all frames. Next, the principle is formulated in physiological terms. A hypothetical "neural comb filter" is described, based on neural delay lines and inhibitory synapses, and tested using auditory nerve fiber discharge data obtained in response to concurrent vowels [Palmer, A. R. (1990). "The representation of the spectra and fundamental frequencies of steady-state single- and double-vowel sounds in the temporal discharge patterns of guinea pig cochlear-nerve fibers.," J. Acoust. Soc. Am. 88, pp. 1412-1426]. Processing successfully suppresses the correlates of either vowel in the response of fibers that respond to both, allowing the other vowel to be better represented. The filter belongs to the class of "cancellation models" for which predictions can be made concerning the outcome of certain psychoacoustic experiments. These predictions are discussed in relation to recent experimental results obtained elsewhere.