rizos21@interspeech_2021@ISCA

Total: 1

#1 Multi-Attentive Detection of the Spider Monkey Whinny in the (Actual) Wild [PDF] [Copy] [Kimi1]

Authors: Georgios Rizos ; Jenna Lawson ; Zhuoda Han ; Duncan Butler ; James Rosindell ; Krystian Mikolajczyk ; Cristina Banks-Leite ; Björn W. Schuller

We study deep bioacoustic event detection through multi-head attention based pooling, exemplified by wildlife monitoring. In the multiple instance learning framework, a core deep neural network learns a projection of the input acoustic signal into a sequence of embeddings, each representing a segment of the input. Sequence pooling is then required to aggregate the information present in the sequence such that we have a single clip-wise representation. We propose an improvement based on Squeeze-and-Excitation mechanisms upon a recently proposed audio tagging ResNet, and show that it performs significantly better than the baseline, as well as a collection of other recent audio models. We then further enhance our model, by performing an extensive comparative study of recent sequence pooling mechanisms, and achieve our best result using multi-head self-attention followed by concatenation of the head-specific pooled embeddings — better than prediction pooling methods, as well as compared to other recent sequence pooling tricks. We perform these experiments on a novel dataset of spider monkey whinny calls we introduce here, recorded in a rainforest in the South-Pacific coast of Costa Rica, with a promising outlook pertaining to minimally invasive wildlife monitoring.