borsdorf22@interspeech_2022@ISCA

Total: 1

#1 Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language [PDF] [Copy] [Kimi1]

Authors: Marvin Borsdorf ; Kevin Scheck ; Haizhou Li ; Tanja Schultz

We introduce blind language separation (BLS) as novel research task, in which we seek to disentangle overlapping voices of multiple languages by language. BLS is expected to separate seen as well as unseen languages, which is different from the target language extraction task that works for one seen target language at a time. To develop a BLS model, we simulate a multilingual cocktail party database, of which each scene consists of two randomly selected languages, each represented by two randomly selected speakers. The database follows the recently proposed GlobalPhoneMCP database design concept that uses the audio data of the GlobalPhone 2000 Speaker Package. We show that a BLS model is able to learn the language characteristics so as to disentangle overlapping voices by language. We achieve a mean SI-SDR improvement of 12.63 dB over 231 test sets. The performance on the individual test sets varies depending on the language combination. Finally, we show that BLS can generalize well to unseen speakers and languages in the mixture.