Abstract:
This thesis explores the integration of quaternion algebra into neural network architectures to enhance their efficiency for diverse audio processing tasks. Quaternion-based transformations are employed to achieve structural compression to reduce model size and computational demands. Further, this is achieved while retaining the task’s high accuracy and reliability and enhancing the model’s learning capabilities. This thesis presents three main studies: the first focuses on applying quaternion models for on-device keyword spotting, demonstrating their ability to match the performance of state-of-the-art models with a fraction of the computational footprint; the second investigates the combined use of quaternion transformations and pruning techniques in convolutional neural networks for audio tagging, achieving substantial reductions in computational demands and memory usage; the third explores the use of quaternion algebra in speech synthesis through vocoder models, which enables high-quality speech generation with significantly reduced parameter sizes and computational overhead. The proposed quaternion models demonstrate substantial reductions in parameter count and computational load across these applications, making them suitable for deployment on resource-limited devices. Experimental validations on standard datasets highlight the effectiveness and versatility of these models. Together, these studies underscore the potential of quaternion-based models in advancing real-world applications on edge devices. All these studies achieve or set the state-of-the-art performance in their respective domains.