Total: 1
This paper proposes an ensembling model as spoofed speech countermeasure, with a particular focus on synthetic voice. Despite the recent advances in speaker verification based on deep neural networks, this technology is still susceptible to various malicious attacks, so that some kind of countermeasures are needed. While an increasing number of anti-spoofing techniques can be found in the literature, the combination of multiple models, or ensemble models, still proves to be one of the best approaches. However, current iterations often rely on fixed weight assignments, potentially neglecting the unique strengths of each individual model. In response, we propose a novel ensembling model, an adaptive neural network-based approach that dynamically adjusts weights based on input utterances. Our experimental findings show that this approach outperforms traditional weighted score averaging techniques, showcasing its ability to adapt to diverse audio characteristics effectively.