Hearnet in Paris

Info for speakers

Dear Speakers,

Thanks to you all for agreeing to contribute to the Hearnet meeting at Ircam in Paris on Sept 10/11 (details and practical information here).

The exact schedule is still open but we're aiming to fit 11 speakers in about 9 hours talking time. Standard slot is 1/2 hour (talk + questions) but you can ask for more (or less) if you wish. Please do so soon, and send me your title. I'll be away at Eurospeech beginning September so it's best to settle things before August 31.

We have an OHP projector and a video projector. Local machines are macs, but they should accept powerpoint files from PCs. Might be worth sending us a CD or diskette in advance so we can check.

There's a range of topics to cater to every taste, from models to penguins. Part of it, about half, is focused on the equalization-cancellation (EC) model of binaural processing. We're fortunate to have Jeroen Breebart and/or Steven van de Par who have worked a lot recently to develop and test the EC model, John Culling who proposed a version of EC (mEC) that works independently in each peripheral frequency band, Michael Akeroyd who has also worked with this model, David McAlpine who will explain to us why it is (or hopefully isn't) all rubbish from the physiologist's point of view, and Shiro Ikeda who will tell us about recent developments in Independent Component Analysis (ICA), a hot new technique that seems related to EC.

Why the EC model? Because of my own interest in monaural cancellation models. Because it gives a good match to human performance, as John and Jeroen and Steven and Michael and others have shown. Because, among auditory models, it seems most likely to provide powerful signal-processing functions needed by applications such as speech recognition, etc. Because issues such as "why channel-wise EC is better than uniform EC", or "why frequency-domain ICA works well" may shed light on the wider issue of the role of peripheral frequency analysis in time-domain models of processing, and why its often imporant that components are "resolved" in monaural as well as binaural tasks.

At the risk of making a fool of myself, here's a story about how things might fit. There is a binaural processing model called Equalization-Cancellation (EC), originally due to Durlach (1963). In that model, signals from the two ears are "equalized" (delay and scale) and then subtracted. A strong interfering source can be canceled, and in this way one can explain binaural release from masking and binaural masking level differences (BMLD). This model has been recently gaining favor, and we have experts on it.

The EC model attempts to find internal delay and scaling factors that cancel a interfering source. If there are two sources, it can cancel one or the other, producing two outputs. Each output depends (via some filtering) on the non-canceled source only. If sources are statistically independent, so should be the outputs. In this sense, EC has a similar goal to Independent Component Analysis (ICA). ICA takes M mixtures of N sources (such as from M microphones) and tries to find an "unmixing matrix" that best retrieves the sources, without knowledge of either the sources or the mixing matrix. It does so (assuming that the sources are statistically independent) by applying some measure of statistical independence to the output, and twiddling the unmixing matrix until that measure is maximized. Obviously EC and ICA have similar goals.

However they differ in criteria. The criterion of EC is minimization of output power as a function of parameters. This rather crude criterion is OK if the masker is strong, which is of course the situation where unmasking is most needed. ICA as I understand can use a variety of criteria. Early work used higher-order statistics of instantaneous values on the assumption that distributions of values were of different shapes for different sources (ie a square-wave generator has a different distributions (0s and 1s) than a sine-wave generator). Later work (such as Shiro's) has found more interesting criteria such as independence of temporal modulation of unmixed sources. This seems reasonable to any auditory person interested in temporal structure. I'm sure they could have found it earlier if they'd asked...

Early ICA work was applied to the situation where the mixing matrix was scalar, that is, the M mixtures were weighted sums of signals with no delay or filtering. Obviously that can't handle propagation delays from sources to microphones (or ears), reverberation, etc. More recently "frequency domain ICA" methods have been developped in which the ICA problem is solved independently in each frequency channel (the mixing and unmixing matrices for each band are then scalar with complex coefficients). This is effective on condition of solving the "permutation problem": sorting the N outputs of each channel to group across channels the outputs that belong to the same source. Recent work (in particular by Shiro) has shown that this can be done on the basis of temporal modulation (see for example http://medi.uni-oldenburg.de/members/ane/, or http://citeseer.nj.nec.com/ikeda99method.html). In any event, it appears that frequency analysis is a useful step in ICA.

Recently, John Culling and Quentin Summerfield have developped a "modified EC" (mEC) model that works independently within each peripheral frequency channel. That is, equalization parameters are allowed to differ from channel to channel, and they are determined locally within each channel by a minimization criterion. John and Quentin have shown that many phenomena can be explained based on this assumption, so it seems that's how our ears do it. Now, why do we use a strategy that is complex (standard EC is simpler), presents the auditory system with a "permutation problem", and seems to throw away useful information about between-channel patterns? The interesting thing of course is that this resembles frequency-domain ICA.

Questions I'd like to ask are:

What makes cancellation a good ingredient? What about other models?
Is the ICA / EC parallel that I sketched reasonable?
What exactly is gained by frequency analysis in mEC and ICA?
How does frequency-domain ICA depend on bandwidth? Are auditory bandwidths reasonable from a functional point of view?
Can mEC and/or frequency-domain ICA handle more than two sources with only two sensors (ears)? What conditions must be met? Can they handle reverberation?
How about criteria? ICA has come up with new criteria recently, are there more to come? How do they fit with the stuff we use in auditory models?
Is there something auditory models are missing and ICA has, or vice-versa?
Can we extrapolate from binaural to monaural processing (for example based on periodicity)?
Is the advantage procured by filtering for binaural processing somehow related to the issue of resolvability?
Attention and top-down processes? Missing feature theory?
etc.

In addition to the ICA papers mentioned above you might want to look at Jeroen's three papers in this month's JASA (there's also one by John and one by Christian). For mEC you might want to look up one of John's recent papers on pitch. For convenience I copied the .pdfs here. Those of you who are more at ease with binaural correlation à la Jeffress than with binaural equalization-cancellation à la Durlach might find comfort in my paper for the upcoming CRAC workshop, similar to what I presented at last Hearnet, that argues that they're much the same.

This "story" is from my own point of view. Feel free to interact with it or ignore it!

In addition to this monomaniac EC stuff, we have lots more. Hideki Kawahara (also an ICA expert) will give a demo of STRAIGHT, a high-quality analysis-synthesis system that you must have if you're doing experiments with speech. Olivier Warusfel will present Spat, a comprehensive software package that simulates room acoustics, propagation, HRTFs, etc. in real time. Christian Lorenzi will talk about second-order modulation transfer functions (see his paper!). Guy Brown is presenting work on speech recognition, binaural modelling, missing feature theory, etc.. David McAlpine is going to tell us how our models fit with the real stuff one finds in the brain. Bernhard Gaese is (I think) presenting works on hearing in the owl and the rat, and the role of attention. Thierry Aubin will tell us about cocktail parties at the antipodes, where king and emperor penguins find their spouses by recognizing their call among hundreds of simultaneous calls of other individuals. Moving from birds and bats to beasts, Bob Carlyon will talk about "elephant noise" in the PECA talk after Hearnet. Don't worry, it's not all EC.

Alain