Abstract:
Emotion recognition via vision has been deeply associated with facial expressions, and the detection of emotions has, more often than not, been based on the same [7]. However, context plays an imperative role in emotion recognition but has not been incorporated widely so far [6]. The meaning of an emotion might entirely switch when shifted from one setting to another if only facial expressions are taken into account. To cater to this issue, our work aims at creating a data set of video clips and an accompanying approach that uses Recurrent Convolution Networks (RCNs) to effectively take into consideration the temporal context. The work proposes a dual-stream architecture to extract facial and contextual features separately. There are a handful of studies that focus primarily on data sets of video clips for emotion recognition, while also taking into account the context [8] [12]. However, since there exists no study in the Indian context about the same, we contribute to this gap by proposing the ICER (Indian Contextual Emotion Recognition) data set of 12,451 video clips, based on the multi-ethnic Indian context. 4533 videos from the dataset are strongly annotated while the remaining are weakly annotated- the former being annotated on 7 emotions including Ekman's six basic emotions ('happiness', 'sadness', 'surprise', 'disgust', 'anger', 'fear') along with 'neutral'. This adds novelty to our study, and also our analysis of the data, owing to the differences in emotion recognition and its features that come from the cultural differences.