Abstract:
With an increase in usage and availability of wearable devices like GoPro, Microsoft Hololens, Google Glass, etc, egocentric video analysis has become essential.
An interesting application is action recognition in egocentric videos. Research has been performed on action recognition in the first person (egocentric) videos. First person action recognition is a hard problem given that first person videos are shaky, have limited hand-object interaction, and have limited publicly available datasets. Most of the existing research uses hand-crafted features to learn actions which work best for a given domain. First person videos have two types of actions. First, where hand-object interactions are present and the other one, where no such interactions are present. Current methods can only be used to recognize any one type of action but not both using a single method. This research proposes a novel action recognition method to recognize two types of actions, one where hand-object interaction is present and other where no such interactions are present. Further, a new dataset named IIITD Plumbing dataset is introduced which provides a large number of videos, objects, and actions. The proposed system makes use of spatio-temporal information captured from raw frames. We also introduce a new method to perform activity recognition that learns grammar from learned actions.