cauzinille24@interspeech_2024@ISCA

Total: 1

#1 Investigating self-supervised speech models' ability to classify animal vocalizations: The case of gibbon's vocal signatures [PDF1] [Copy] [Kimi] [REL]

Authors: Jules Cauzinille, Benoît Favre, Ricard Marxer, Dena Clink, Abdul Hamid Ahmad, Arnaud Rey

With the advent of pre-trained self-supervised learning (SSL) models, speech processing research is showing increasing interest towards disentanglement and explainability. Amongst other methods, probing speech classifiers has emerged as a promising approach to gain new insights into SSL models out-of-domain performances. We explore knowledge transfer capabilities of pre-trained speech models with vocalizations from the closest living relatives of humans: non-human primates. We focus on classifying the identity of northern grey gibbons (Hylobates funereus) from their calls with probing and layer-wise analysis of state-of-the-art SSL speech models compared to pre-trained bird species classifiers and audio taggers. By testing the reliance of said models on background noise and timewise information, as well as performance variations across layers, we propose a new understanding of the mechanisms underlying speech models efficacy as bioacoustic tools.