next up previous
Next: Multiple microphones Up: The cancellation principle in Previous: The cancellation principle in

Harmonicity and F0

Much of my work has been to explain how the harmonicity (or F0) cue is used in auditory scene analysis. Harmonicity is the most powerful among ASA cues. It is also the cue most often exploited in computational ASA systems and voice-separation systems.

Results have not been spectacular. I attribute this to the following factors: (a) Much effort has been invested in so-called "harmonic enhancement" (using the target's F0), which is intuitively appealing but fundamentally not very powerful, and not used by the auditory system. (b) The difficulty of F0estimation of mixed speech, and of dealing with F0errors. (c) Spectral distortion caused by the segregation process. (d) Lack of effectiveness at high SNRs, at which systems are usually tested (speech separation is most effective at low SNR). (e) The tendency to work in the spectral domain, less flexible than the time domain with respect to non-stationarity.

In my opinion, people have not yet discovered how to milk this cow. Harmonicity should give spectacular benefits if the following conditions are respected:

1.
Work at low SNR. The overhead of distortion caused by the segregation process is too large at high SNR. More importantly, F0estimation of the interference is much more reliable at low SNR. A nice point from the point of view of the researcher, is that recognition rates at low SNR are small, so improvements can be spectacular!
2.
Use harmonic cancellation. It is more effective in principle than harmonic enhancement, and it is certainly used by human listeners.
3.
Use missing-feature or damaged-feature techniques (Sect. [*]. Cancellation causes spectral distortion that interfere with the recognition stage. This distortion can be modeled and compensated.
It's probably not a bad idea to work in the time domain which is most flexible with respect to non-stationarity (no need to decide on a window size).


next up previous
Next: Multiple microphones Up: The cancellation principle in Previous: The cancellation principle in
Alain de Cheveigne
1998-02-16