Abstract:
A lot of people are learning English around the world and take assessments every year. With the rise of candidates taking these assessments and the shortage of qualified experts rating these assessments, there is a need to automate this time consuming process. We propose a novel deep learning technique for non native automated speech scoring called Recursive Modeling wherein we feed our text based models with additional speaker specific context. We compare our technique with strong baselines and find that such modeling significantly improves the performance of the model. We also propose a multi-modal network that takes in text based as well as audio based user specific features that help boost the overall performance. We further present a qualitative and quantitative analysis of our model.