Abstract:
Online Social Networks have become a cornerstone of Web 2.0 era. Internet users around the world use Online Social Networks as primary sources to consume news, updates, and information about events around the world. However, given the enormous volume and veracity, it is hard to manually moderate all content that is generated and shared on these networks. This phenomenon enables hostile entities to generate and promote various types of poor quality content (including but not limited to scams, fake news, false information, rumors, untrustworthy or unreliable information) and pollute the information stream for monetary gains, hinder user experience, or to compromise system reputation. We aim to address this challenge of automatically identifying poor quality content on Online Social Networks. We focus our work on Facebook, which is currently the biggest Online Social Network.
We provide an in-depth analysis of poor quality context-specific content published on Facebook. In particular, we concentrate on content generated in the context of news-making events. We propose and evaluate automated techniques to identify and mitigate the spread of such poor quality content on Facebook in real-time.
The main contributions of this work are: (a) we characterized and analyzed poor quality, context-specific content generated and disseminated on Facebook during news-making events, with the purpose of identifying characteristics that differentiate it from benign content, (b) we showed the effectiveness of our automated techniques to identify poor quality content on Facebook using content-level features combined with metadata, and temporal activity, and (c) we developed and deployed a real-world solution for identifying poor quality context-specific content published on Facebook. We evaluated the efficiency of this real-time system with a live deployment used by actual Facebook users.
First, we analyzed Facebook data for 19 global news-making events from 2013-2015 for the spread of untrustworthy content, scams, self-promotion posts, fake information, adult content, etc. Some of the prominent events we analyzed are the Paris Attacks (2015), FIFA World Cup (2014), Boston Marathon Blasts (2013), Death of Nelson Mandela (2013), Birth of the first Royal Baby (2013). We identified over 11 thousand Facebook posts promoting untrustworthy information, child-unsafe content, scams, hate speech, and spam. We performed an in-depth analysis of how this poor quality content differs from benign content and identified characteristics that differentiate entities posting poor quality content from entities posting benign content. Second, we showed how features from user-generated content, combined with meta information, and temporal behavior can be used to identify poor quality content during events effectively. Third, we proposed and evaluated automated techniques to identify poor quality content and entities using supervised learning techniques. We developed and deployed Facebook Inspector (FbI), a real-time system to identify poor quality content on Facebook during events. Facebook Inspector is available as a browser plug-in, and has been downloaded more than 5,000 times. The system has a daily audience of over 250 Facebook users. During 20 months of its deployment, Facebook Inspector has received over 7.4 million requests and has evaluated over 2.8 million public Facebook posts, allowing us to evaluate its performance and usability.