An RFP dataset for Real, Fake, and Partially fake audio detection

#1 An RFP dataset for Real, Fake, and Partially fake audio detection [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Abdulazeez AlAli, George Theodorakopoulos

Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The lowest EER recorded was 25.42%. Therefore, we believe that creators of detection models must seriously consid-er using datasets like RFP that include PF and other types of fake audio.

Subjects: Sound , Cryptography and Security , Audio and Speech Processing

Publish: 2024-04-26 23:00:56 UTC

2404.17721

#1 An RFP dataset for Real, Fake, and Partially fake audio detection [PDF1] [Copy] [Kimi1] [REL]

#1 An RFP dataset for Real, Fake, and Partially fake audio detection [PDF¹] [Copy] [Kimi¹] [REL]