Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals

#1 Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals [PDF] [Copy] [Kimi] [REL]

Authors: Jinuk Kwon, David Harwath, Debadatta Dash, Paul Ferrari, Jun Wang

Direct speech synthesis from neural activity can enable individuals to communicate without articulatory movement or vocalization. A number of recent speech braincomputer interface (BCI) studies have been conducted using invasive neuroimaging techniques, which require neurosurgery to implant electrodes in the brain. In this study, we investigated the feasibility of direct speech synthesis from non-invasive, magnetoencephalography (MEG) signals acquired while participants performed overt speech production tasks. We used a transformer-based framework (Squeezeformer) to convert neural signals into Mel-spectrograms followed by a neural vocoder to generate speech. Our approach achieved an average correlation coefficient of 0.95 between the target and the generated Mel spectrograms, indicating high fidelity. To the best of our knowledge, this is the first demonstration of synthesizing intelligible speech directly from non-invasive brain signals.

Subject: INTERSPEECH.2024 - Others

kwon24@interspeech_2024@ISCA

#1 Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals [PDF] [Copy] [Kimi] [REL]