Abstract:
Online education platforms have diverse learning content like videos, audio lectures, and technical articles. The major drawback of video-based learning content is the inability to directly access the content of interest that describes a particular topic. To enable smart browsing abilities in the video for quick access to an explanation of topics, it is essential for topical segmentation of videos. To obviate the need for manual topical segmentation of the video, this paper presents a system called EduCIndex that can automatically generate a Table of Content for a given video through representation learning by fusing different modalities like Text, Audio, and Video. EduCIndex performs segmentation for a video and assigns a relevant topic to each segment. To develop the system, we curate a novel dataset with around 1500 hrs of educational videos and a table of content for each video by scraping the web. We propose a novel multi-task learning-based approach that combines the tasks of learning the segment boundary and segment topic using sequential attention over a sequence of 1-minute video clips. Our proposed model provides 49.82% and 15.23% relative improvement in the topic name extraction and segmentation of the videos over the baselines, respectively, in terms of ROUGE-1 and F1 score.