We present psychoacoustical data that are not explained by current models of F0-guided segregation, together a model that can explain them. Listeners were presented with pairs of concurrent vowels, and requested to report either one or two vowels. Vowels were synthesized either at the same fundamental frequency (F0) or with a deltaF0 of 6%. RMS levels before mixing were either the same or different by 10 or 20 dB. Responses were scored separately for each vowel within a stimulus, and classified according to the vowel's level relative to the competing vowel (-20, -10, 0, 10, 20 dB) and the deltaF0 (0 and 6%). The difference in target vowel identification rate between deltaF0 conditions was greatest at -10 and -20 dB, that is, when the target was relatively weak. This outcome can be accounted for by the following model, operating within peripherally filtered channels. A neuron is driven through two pathways: one direct, via an excitatory synapse, and the other delayed, via an inhibitory synapse. The neuron fires each time a spike arrives along the direct pathway, unless a spike arrives simultaneously along the indirect pathway. The delay is tuned to the period that dominates the overall response to the double-vowel stimulus (derived from the largest peak in a summary ACF pattern). The stronger vowel is identified by matching the unfiltered summary ACF pattern to stored templates. The weaker vowel is identified by matching the summary ACF pattern derived from the cancellation residual. The model is consistent with experimental results that indicate that the auditory system segregates harmonic sounds by cancelling harmonic backgrounds. It accounts for the strong deltaF0 effect at low levels observed in our vowel identification experiment, and predicts quite well the number of vowels (1 or 2) reported by the subjects for each stimulus.