Abstract:
This study focuses on understanding content and coherence for Automated Oral Proficiency Scoring and Feedback Generating System in the context of spontaneous speech of non-native(L2) English learners. We aim to understand and introduce a new dataset with verbal responses from Simulated Oral Proficiency Interview annotated with coherence and content scores. In this report, guidelines explicitly tailored to our needs have been provided, followed by annotating the spoken responses. An agreeable inter-annotator score of _ = 0.770 (Cohen's kappa) and_ = 0.884 (Krippendor_'s alpha) is obtained. However, the skewness of the data forced us tore-sample the dataset balanced across multiple dimensions. The time and labour to manually transcribe and annotate this new data proved a bottleneck in the content modeling. We limited ourselves to content-relevance modeling and started analyzing a similar common dataset. We provided various data augmentation techniques to build training data samples and provided a deep-neural network model architecture for this task. The results obtained proved to be promising. A thorough analysis of the model, data augmentations, and the results was done, which gave us insight into their effectiveness and the problems that need to be addressed. We later suggested a few techniques and changes which can be investigated in future to boost the scores.