araki24@interspeech_2024@ISCA

Total: 1

#1 Frontier of Frontend for Conversational Speech Processing [PDF] [Copy] [Kimi] [REL]

Author: Shoko Araki

To deepen and enrich our daily communications, researchers have made significant efforts over several decades to develop technologies that can recognize and understand natural human conversations. Despite significant progress in both speech/language processing and speech enhancement technology, conversational speech processing remains challenging. Recordings of conversations with distant microphones contain ambient noise, reverberation, and speaker overlap that changes as the conversation progresses. Consequently, recognizing conversational speech is much more challenging than single-talker speech recognition, and frontend technologies such as speech enhancement and speaker diarization are essential to achieving highly accurate conversational speech processing. For more than two decades, the presenter‘s research group has explored frontend techniques (source separation, dereverberation, noise reduction, and diarization) for handling realistic natural conversations with distant microphones. In this talk, I would like to talk about the evolution and frontier of frontend technologies for conversational signal processing. Specifically, we will trace the evolution of multichannel signal processing and neural network techniques, including beamforming and target speaker tracking and extraction, which have always played an important role in successive cutting-edge frontends, along with the latest achievements.