yin18@interspeech_2018@ISCA

Total: 1

#1 Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming [PDF] [Copy] [Kimi1]

Authors: Lu Yin ; Ziteng Wang ; Risheng Xia ; Junfeng Li ; Yonghong Yan

The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi-talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two-speaker and three-speaker mixtures.