Abstract:
Online social media is a powerful platform for dissemination of information during important real-
world events. Beyond the challenges of volume, variety and velocity of content generated on online
social media, veracity poses a much greater challenge for effective utilization of this content by
citizens, organizations, and authorities. Veracity of information refers to the trustworthiness /
credibility / accuracy / completeness of the content. Over last few years social media has also
been used to disseminate misinformation in the form of rumors, hoaxes, fake images, and videos.
We aim to address this challenge of veracity or trustworthiness of content posted on social media.
The spread of such untrustworthy content online has caused the loss of money, infrastructure and
threat to human lives in the onl ine world. We focus our work on Twitter, which is one of the most
popular microblogging web service today.
We provide an in-depth analysis of misinformation spread on Twitter during real-world events. We
propose and evaluate automated techniques to mitigate misinformation spread in real-time.
The main contributions of this work are: (i) we analyzed how true versus false content is propagated
through the Twitter network, with the purpose of assessing the reliability of Twitter as an information source during real-world events; (ii) we showed the effectiveness of automated techniques to
detect misinformation on Twitter using a combination of content, meta-data, network, user pro le
and temporal features; (iii) we developed and deployed a novel framework for providing indication
of trustworthiness / credibility of tweets posted during events. We evaluated the effectiveness of
this real-time system with a live deployment used by real Twitter users.
First, we analyzed Twitter data for 25+ global events from 2011-2014 for the spread of fake images,
rumors, and untrustworthy content. Some of the prominent events analyzed by us are: Mumbai
blasts (2011), England Riots (2011), Hurricane Sandy (2012), Boston Marathon Blasts (2013),
Polar Vortex (2014). We identified tens of thousands of tweets containing fake images, rumors, fake
websites, and by malicious user pro files for these events. We performed an in-depth characterization
study of how this false versus the true data is introduced and disseminated in the Twitter network.
Second, we showed how features of meta-data, network, event and temporatl from user-generated
content can be used e effectively to detect misinformation and predict its propagation during real-
world events. Third, we proposed and evaluated an automated methodology for assessing credibility
of information in tweets using supervised machine learning and relevance feedback approach. We
developed and deployed a real-time version in TweetCred, a system that assigns a credibility score
to tweets. TweetCred, available as a browser plug-in, has been installed and used by 1,808 real
Twitter users. During ten months of its deployment, the credibility score for about 12 million tweets
was computed, allowing us to evaluate TweetCred in terms of accuracy, performance, effectiveness
and usability.
The system TweetCred built as part of this thesis work is used e ectively by emergency responders,
re ghters, journalists and general users to obtain credible content from Twitter. This thesis work
has shown that measuring credibility of the Twitter content is possible using semi-automated
techniques, and the results can be valuable to the real-world users. The insights obtained from
this research and deployment provide a basis for building more sophisticated technology to tackle
similar problems on diff rent social media.