Please use this identifier to cite or link to this item:
http://repository.iiitd.edu.in/xmlui/handle/123456789/1448| Title: | Learning speaker, emotion, age, and gender information through disentanglement of speech pre-trained representations |
| Authors: | Koshal, Devyani Buduru, Arun Balaji (Advisor) |
| Keywords: | Speech Forensics Self-Supervised Learning Pre-Trained Models Multi-Task Learning Convolutional Neural Networks |
| Issue Date: | 29-Nov-2023 |
| Publisher: | IIIT-Delhi |
| Abstract: | Forensic speech science, rooted in acoustics, plays a key role in legal investigations. Among its diverse applications, automatic speaker recognition (ASR) stands as a primary task within forensic speech analysis followed by speech emotion recognition (SER), gender recognition (GR) and age estimation (AE). Expanding beyond conventional identification methods, leveraging multi-task learning and speech-pre-trained models (PTM) representations enhances the scope of analysis and is more resource-friendly. This approach allows simultaneous exploration of multiple facets, including speaker information, emotional cues, gender characterization, and age estimation embedded within speech. Additionally, this modeling prevents training models for tasks individually and resulting in preservation of computational resources as well as time. This multi-dimensional analysis aids in offering insights beyond identification and enriches the depth of the investigations via a comprehensive comparison of representations from various PTMs for the aforementioned tasks. |
| URI: | http://repository.iiitd.edu.in/xmlui/handle/123456789/1448 |
| Appears in Collections: | Year-2023 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| BTP_Report_23_Devyani_Koshal_2020055 - Devyani Koshal.pdf Restricted Access | 5.58 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.