Total: 1
Blind Speech Separation (BSS) aims to separate multiple speech sources from audio mixturesrecorded by a microphone array. The problem ischallenging because it is a blind inverse problem,i.e., the microphone array geometry, the room impulse response (RIR), and the speech sources, areall unknown. We propose ArrayDPS to solve theBSS problem in an unsupervised, array-agnostic,and generative manner. The core idea builds ondiffusion posterior sampling (DPS), but unlikeDPS where the likelihood is tractable, ArrayDPSmust approximate the likelihood by formulatinga separate optimization problem. The solution to the optimization approximates room acousticsand the relative transfer functions between microphones. These approximations, along with thediffusion priors, iterate through the ArrayDPSsampling process and ultimately yield separatedvoice sources. We only need a simple single-speaker speech diffusion model as a prior, alongwith the mixtures recorded at the microphones; nomicrophone array information is necessary. Evaluation results show that ArrayDPS outperformsall baseline unsupervised methods while beingcomparable to supervised methods in terms ofSDR. Audio demos and codes are provided at:https://arraydps.github.io/ArrayDPSDemo/ andhttps://github.com/ArrayDPS/ArrayDPS.