Options
STATNet: Spectral and Temporal features based Multi-Task Network for Audio Spoofing Detection
Date Issued
2022-01-01
Author(s)
Ranjan, Rishabh
Vatsa, Mayank
Singh, Richa
DOI
10.1109/IJCB54206.2022.10007949
Abstract
With the rise in mobile phone users and VoIP, voice has emerged as an easy and accessible biometric modality for identification or verification tasks. Given the increasing usage of voice biometrics, the security of these systems is also of paramount importance. Researchers have demon-strated that Automatic Speaker Verification (ASV) systems are prone to spoofing attacks like synthetic speech or fake speech, which can be used maliciously for a variety of tasks such as impersonation, fake news spreading, and opinion formation. This research proposes a deep convolution-based multi-task network which performs both spoof detection and source identification for synthetic speech. The pro-posed model is evaluated on three datasets ASVspoof2019 LA, FOR-Norm and In-the- Wild Audio Deepfake dataset. The results demonstrate the EER of 2.456%, 0.814%, and 0.199% on the ASVspoof2019 LA, FOR-Norm, and In-the-Wild Audio Deepfake datasets. In addition, we have also demonstrated results for cross-dataset evaluation and speech source identification.