Abstract:
YouTube is one of the largest video sharing websites (with social networking features) on the
Internet. The immense popularity of YouTube, anonymity and low publication barrier has
resulted in several forms of misuse and video pollution such as uploading of malicious, copyright
violated and spam video or content. YouTube has a popular feature (commonly used) called
as video response which allows users to post a video response to an uploaded or existing video.
Some of the popular videos on YouTube receive thousands of video responses. We have observed
the presence of opportunistic users posting unrelated, promotional, pornographic videos (spam
videos posted manually or using automated scripts) as video responses to existing videos.
We present a method of mining YouTube to automatically detect video response spam. We
formulate the problem of video response spam detection as a one-class classi cation problem (a
recognition task) and divide the problem into three sub-problems: promotional video recognition,
pornographic or dirty video recognition and automated script or botnet uploader recognition.
We create a sample dataset of target class videos for each of the three sub-problems and identify
contextual features (meta-data based or non-content based features) characterizing the target
class. Our empirical analysis reveals that certain linguistic features (presence of certain terms
in the title or description), temporal features, popularity based features, time based features
can be used to predict the video type. We identify features with discriminatory powers and use
it within a one-class classi cation framework to recognize video response spam. We conduct a
series of experiments to validate the proposed approach and present evidences to demonstrate
the e ectiveness of the proposed solution with more than 80% accuracy.