Abstract:
In this project, we attempt to understand how dis uencies in speech can be used in providing feedback for automated oral pro ciency scores in the context of spontaneous speech for L2 learn- ers of English. Our aim is to model and present a corpus of spontaneous speech of non native speakers of English consisting of segment labelled annotations of dis uencies. This corpus will be further used to develop dis uency detection models for automated scoring systems of oral pro ciency. In this report, we discuss some crucial points that we covered from our extensive literature review on dis uency from linguistics point of view as well as computational point of view. Then, through these discussions we prepare an annotation scheme for annotating dis u- encies for our dataset targeted speci cally for our purpose. We also do a qualitative analysis of the dataset to nd out more interesting features which could be helpful in shaping our future work.