Total: 1
Recent work has shown that the locus of selective auditory attention in multi-speaker settings can be decoded from single-trial electroencephalography (EEG). This study represents the first effort to investigate the decoding of selective auditory attention through the utilization of an ensemble model. Specifically, we combine predictions solely based on brain data using two stacked deep learning-based models, namely the SpatioTemporal Attention Network (STAnet) and SpatioTemporal Graph Convolutional Network (ST-GCN), through an average-soft voting layer. This ensemble approach demonstrates improved generalizability within short 1-second decision windows, incorporating subtle distinctions in spatial features extracted by the networks from the EEG. This results in an effective trial-independent prediction of spatial auditory attention, outperforming baseline models by a substantial margin of 10% across two publicly available auditory attention datasets1.