Multichannel source separation with deep neural networks

by A. Nugraha, A. Liutkus, E. Vincent

Abstract

This article addresses the problem of multichannel music separation. We propose a framework where the source spectra are estimated using deep neural networks (DNNs) and combined with spatial covariance matrices to encode the source spatial characteristics. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. We evaluate the proposed framework for the task of music separation on a large dataset. Experimental results show that the method we describe performs consistently well in separating singing voice and other instruments from realistic musical mixtures.

Full text

The full text will soon be available here

Voice/music separation

download the DSD100 database here (16gb)

Some examples of separated results

Multitrack HTML player by binarymind. Freely download and use it yourself here

Contact

aditya (dot) nugraha (at) inria (dot) fr

antoine (dot) liutkus (at) inria (dot) fr