Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks

#1 Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks [PDF] [Copy] [Kimi¹] [REL]

Authors: Michael Pucher, Thomas Woltron

It is a common experience of most speakers that the playback of one’s own voice sounds strange. This can be mainly attributed to the missing bone-conducted speech signal that is not present in the playback signal. It was also shown that some phonemes have a high bone-conducted relative to air-conducted sound transmission, which means that the bone-conduction filter is phone-dependent. To achieve such a phone-dependent modeling we train different speaker dependent and speaker adaptive speech conversion systems using airborne and bone-conducted speech data from 8 speakers (5 male, 3 female), which allow for the conversion of airborne speech to bone-conducted speech. The systems are based on Long Short-Term Memory (LSTM) deep neural networks, where the speaker adaptive versions with speaker embedding can be used without bone-conduction signals from the target speaker. Additionally we also used models that apply a global filtering. The different models are then evaluated by an objective error metric and a subjective listening experiment, which show that the LSTM based models outperform the global filters.

Subject: INTERSPEECH.2021 - Speech Synthesis

pucher21@interspeech_2021@ISCA

#1 Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks [PDF] [Copy] [Kimi1] [REL]

#1 Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks [PDF] [Copy] [Kimi¹] [REL]