Abstract:
YouTube is one of the largest video sharing website on the Internet. Several music and record
companies, artists and bands have o cial channels on YouTube (part of the music ecosystem of
YouTube) to promote and monetize their music videos. YouTube consists of the huge amount
of copyright violated content including music videos (the focus of the work presented in this
paper) despite the fact that they have de ned several policies and implemented measures to
combat copyright violations of content. We present a method to automatically detect copyright
violated videos by mining video as well as older meta-data. We propose a multi-step approach
consisting of computing textual similarity between query video title and video search results,
detecting useful linguistic markers (based on a pre-de ned lexicon) in title, mining user pro le
data, analyzing the popularity of the Uploader and the video to predict the category (original
or copyright-violated) of the video. Our proposed solution approach is based on a rule-based
classi cation framework. We validate our hypothesis by conducting a series of experiments on
evaluation dataset acquired from YouTube. The empirical results indicate that the proposed
approach is e ective.