The short-window magnitude spectrum (Welch method) or cepstrum (Aikawa) can benefit by smoothing over a square window with a length equal to a period or period multiple (or any convolution of such windows). Actually any estimate (LPC, etc.) can be smoothed in that way, but the benefit is greatest for estimates that use short windows.
Smoothing with a square window with a length equal to a period is like applying a low pass-filter with zeros at the fundamental and all its multiples. All F0-related fluctuations are smoothed out, leaving only the DC-component. [Note: spectral analysis is supposed to be running, that is, repeated at one-sample intervals. Each component of a spectral vector may be seen as a time-varying signal. It is these signals that are low-pass filtered by convolution with a square window].
Again, reliability of F0estimation is not a big issue. If the F0-estimator chooses a period multiple instead of the period, the zeros will more closely spaced but they will still cancel out the fundamental and its multiples. If the F0estimate is completely wrong, this means that the signal was not very periodic: little is lost by using the "wrong" window size. One might be worried about small estimate errors, for example due to the finite sampling rate. These cause the zeros of the transfer function to be more and more shifted with respect to the harmonics of F0as frequency rises, and therefore less well suppressed. This should not be a major problem, as the energy of the fluctuation is likely to to be small in the high frequency region, and attenuated anyhow by the overall decrease of the sin(x)/x transfer function (this decrease can be made faster, at the expense of time resolution, by using a triangular window of length 2/F0, or some higher-order convolution of a square window). Note that we are talking about the frequency content of spectral coefficients as a function of time: the high-frequency region of the spectrum of the signal itself is as adequately represented as the low frequency region.
The advantage of PP-smoothing is that it allows perfect time-domain smoothing of features from voiced speech. To do so it imposes the smallest possible penalty on temporal resolution. Temporal resolution is a combination of two factors: the window size of spectral analysis, and the window size of the smoothing. The latter, equal to a period, is as short as can be imagined. The former is generally even shorter and its contribution negligeable. Without PP-smoothing, to get stable estimates we would have to choose a smoothing window at least as large as the lowest expected period, and preferably larger. The improvement in temporal resolution can be considerable (of course one is not obliged to use it: there might be reasons to throw away some of this resolution).