Abstract:
With the rise in the potential of artificial intelligence, there now is an increased risk of im-personation and deep fakes; one possibility of impersonation lies within audio, which we call“spoofing attacks”, wherein voices are either changed or faked in order to trick the system intothinking it is genuine, it can be used for tricking systems and people alike. This project creates a deep learning-based architecture capable of detecting spoofing if present in the audio.Spoofing attacks can be of many types, from making physical changes in microphones to digital spoofs injected directly into the audio without tampering with the physical system. These are the kind of spoofing attacks which our model tackles and are called ‘Logical Access(LA)’ based attacks. Examples of LA attacks include: 1) Someone using a voice recording to impersonateanother person, 2) Machine created voices, 3) Converted voices (transforming a genuine voice to sound different) Our model addresses these challenges by combining traditional feature ex-traction with advanced deep neural networks, specifically using Convolutional Neural Networks(CNNs) and an attention-based architecture. Our approach is evaluated using the ASVSpoofdataset, a benchmark dataset, to test the ability of systems to detect spoofing. This study enhances the reliability of voice authentication systems by providing an effective method to detect sophisticated voice spoofing attacks, ensuring that only genuine users can access secured services.