| dc.description.abstract |
Speaker verification (SV) focuses on confirming or denying the claimed identity of a speaker. It is a one-to-one comparison between the test utterance and the claimant’s stored reference voice sample. This process is commonly used in security applications for access control and authentication. One of the popular approaches to building SV systems is to use a speaker identification embedding/encoder model as a feature extractor that is pre-trained on a large number of known speakers. While Deep Neural Networks (DNNs) as encoders have achieved significant benchmarks in SV, they often fail to adapt to new data due to catastrophic forgetting. In such cases, Continual Learning, a technique involving incremental learning or task-based learning, empowers the model to retain previously learned information and handle diverse new information when available. However, adapting SV models to handle new classes without complete retraining remains challenging. Using the Voxceleb2 dataset, one of the largest datasets with over 6000 speakers from 145 nationalities, this study explores the implementation of different settings to develop models flexible enough to generalize to distributional shifts in the data without requiring complete retraining. The research investigates how continual learning techniques can effectively help speech verification systems adapt and scale to maintain model performance while accommodating a vast and dynamic array of classes. Through empirical evaluations on benchmark datasets and simulated realworld scenarios, the study assesses the efficacy of various continual learning approaches in mitigating forgetting and adapting to new classes in speaker verification tasks. |
en_US |