Total: 1
Auditory attention decoding (AAD) aims to recognize the attended speaker based on electroencephalography (EEG) signals in multi-talker environments. Most AAD methods only focus on the temporal or frequency domain, but neglect the relationships between these two domains, which results in the inability to simultaneously consider both time-varying and spectral-spatial information. To address this issue, this paper proposes a dual-branch parallel network with temporal-frequency fusion for AAD, named DBPNet, which consists of the temporal attentive branch and the frequency residual branch. Specifically, the temporal attentive branch aims to capture the time-varying features in the EEG time-series signal. The frequency residual branch aims to extract spectral-spatial features of multi-band EEG signals by the residual convolution. Finally, these dual branches are fused to consider both EEG signals time-varying and spectral-spatial features and get classification results. Experimental results show that compared with the best baseline, DBPNet achieves a relative improvement of 20.4% with a 0.1-second decision window for the MM-AAD dataset, but the number of trainable parameters is reduced by about 91 times.