The experiment's stimulus set contained synthetic steady-state single and double Japanese vowels. For double vowels, there was an F0 difference of 0 or 6%, and a level difference of -20, -10, 0, 10 or 20 dB between the RMS signal levels of the two vowels before mixing. All stimuli were set to the same RMS value before presentation via headphones at a SPL of 63-70dB. Stimulus duration was 200 ms with 20 ms onset and offset ramps.
These are the single vowels in AIFF format, from which all the double vowels can be constructed. They sound terrible via the loudspeaker of my Mac, but all right when I download them back to the NeXT, so I think they are OK (let me know if not!).
Here are some examples of double vowel stimuli for the pair /o/+/u/. The level is of /o/ relative to /u/: