The time-frequency resolution tradeoff is well known:

f t = K |
(1) |

where

In actual practice, the barrier should be seen from the other side.
An FFT gives *too much* resolution rather than too little. Make
the window too short and the spectrum fluctuates too much. Make it
too large and the spectrum has too much detail. How to break out of
this dilemna? Easy. Simply realize that (a) there's plenty of
resolution: the lower limit imposed by the time-frequency tradeoff is
not a problem in practice, (b) the FFT window size is the wrong place
to try to determine resolution, and (c) the right place is in a
smoothing stage, in the frequency and/or time domain, after the FFT.

Smoothing is of course a common practice. Feature extraction
typically starts with an FFT with a shaped window of 20-25 ms. This
size (if I remember correctly) is the result of trial and error, and a
compromise between adequate *time* resolution of useful
transients, and adequate *time*-domain smoothing of pitch-related
fluctuations (Nadeu et al., 1997). Frequency resolution considerations don't enter the
picture. Indeed, with a 40 Hz resolution the spectrum is much too
detailed. The data rate is too large, and the pitch-related details
that we got rid of in the time domain now appear in the frequency
domain. This excess resolution is eliminated by smoothing in the
frequency domain, for example by averageing over neighboring bins, or
indirectly by choosing low-order coefficients of the cepstrum.

Another way of doing it is to start with a short FFT window, short
enough to avoid excess resolution in the frequency domain. This is
followed by smoothing in the time domain, for example by averageing
consecutive spectra. The FFT itself involves a time average, so there
are two consecutive time averages. Is this not equivalent to a *
single* average with a larger window (in the FFT)? Not if the first
average is followed by a non-linear operation such as taking the
magnitude, or cepstrum. Temporal smoothing of magnitude spectra
corresponds to the Welch method of spectral estimation. Smoothing
of cepstra calculated with short windows has recently been proposed
by Aikawa (reference?).

If time and frequency resolution are controlled by the post-FFT smoothing process, there is freedom to choose the resolution (window size) of the Fourier Transform within a wide range. Small-window, large window, both are OK as long as they are followed by smoothing to eliminate excess resolution and lower the data rate. Nothing prevents for example choosing different resolutions at different frequencies, for example with wavelet analysis or an "auditory" filter bank. Different choices are not equivalent, and there is room for experimentation. For such experimentation to be meaningful, spectral analysis should always be followed by the appropriate amount of smoothing in spectral and/or time domain.