Understanding ASR using speechbrain

Arora, Satyam; Shah, Rajiv Ratn (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1600

Title:	Understanding ASR using speechbrain
Authors:	Arora, Satyam Shah, Rajiv Ratn (Advisor)
Keywords:	Automatic Speech Recognition SpeechBrain Text Generation ML NLP DL
Issue Date:	28-Nov-2023
Publisher:	IIIT-Delhi
Abstract:	Automatic Speech Recognition has been a prominent sector in Computer Science Research for decades, generating thousands of research papers in recent years. It is a complex and evolving field, having intersections with ML, NLP, DL and other prominent AI sectors. Being a Complex (involving many steps), Diverse (lots of ways to implement each step, also differing according to the final task) and Computationally heavy field, it had a relatively smaller practitioner base. With the revolution in the Chip Industry, the problem of computation has been solved. The only problem remains in reducing the complexity so that even amateur computer professionals can start their journey on ASR and increase their depth gradually. SpeechBrain, released in 2019, is the exact solution to that problem. It is an all-in-one and user-friendly toolkit that can be used to learn and develop state-of-the-art speech systems aimed at different Speech-related problems. In this report, I have included chapters that are necessary for having a basic understanding of ASR, a basic knowledge of SpeechBrain Repository, and finally, how I have worked in and around this Repository, changed architectures & developed an interactive Web Application capable of Text Generation & Automatic Speech Recognition using SpeechBrain.
URI:	http://repository.iiitd.edu.in/xmlui/handle/123456789/1600
Appears in Collections:	Year-2023

Files in This Item:

File	Description	Size	Format
SatyamArora_2020330_BTP - Satyam Arora.pdf Restricted Access		1.3 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets