Abstract:
This thesis explores advanced techniques in the field of audio spoofing detection. With the emergence of high-quality deepfake generation techniques and the vulnerabilities in automatic speaker verification (ASV) systems, robust countermeasures are essential. We investigate state-of-the-art deep learning models including ECAPA-TDNN, ResNet, TitaNet, and self-supervised models such as Wav2Vec2, WavLM, and UniSpeech. Experiments are conducted on datasets from ASVspoof 2021 and 2024 challenges. Our approach introduces a hybrid integration of handcrafted features with SSL-based embeddings, demonstrating notable improvements in Equal Error Rate (EER) and minimum Detection Cost Function (minDCF). Data augmentation strategies are also evaluated for enhancing robustness. Results indicate that hybrid systems combining engineered and learned features outperform standalone models and offer practical insights for developing next-generation anti-spoofing solutions.