IIIT-Delhi Institutional Repository

Open source social media as sensors for enabling government identification, prediction and response applications

Show simple item record

dc.contributor.author Agarwal, Swati
dc.contributor.author Sureka, Ashish (Advisor)
dc.contributor.author Goyal, Vikram (Advisor)
dc.date.accessioned 2017-06-27T03:52:34Z
dc.date.available 2017-06-27T03:52:34Z
dc.date.issued 2017-06
dc.identifier.uri http://hdl.handle.net/123456789/507
dc.description.abstract Online Social media platforms such as Tumblr, Twitter (micro-blogging website) and YouTube (video sharing website) contains information which is publicly available or open-source. Open source social media intelligence (OSSMInt) is a field comprising of techniques and applications to analyze and mine open-source social media data for extracting actionable information and useful insights. The focus of the work presented in this dissertation is on novel applications and techniques of OSSMInt in the government sector. We propose and develop several novel usage scenarios and applications around OSSMInt for government and broadly divide them into three categories: identification, prediction, and response applications. In particular, we present solutions, tools and techniques for analyzing data from micro-blogging website to analyze citizen complaints and grievances in the public sector [response]. The research presented in this dissertation also describes our work on analyzing data from Twitter micro-blogging website to early forecast a civil unrest and protest [prediction]. Furthermore, we build various applications around identification and detection that are useful for the government and security analysts. We demonstrate the application of OSSMInt for identifying religious conflicts within society by mining public opinions on Tumblr website and fill the gaps of offline surveys. The study presented in this dissertation propose solutions for enabling law enforcement agencies to detect, prevent and combat online radicalization and extremism (content, users, and communities) by mining data from Tumblr, Twitter and YouTube [identification]. We also propose to use the deep natural language processing analysis based techniques for automatic identification of racist and radicalized posts based on the intent of the author. Furthermore, we also propose and build an application for detecting secret message exchanged in an adversarial communication and capture the obfuscated terms in messages. It is technically challenging to analyze social media content due to the free-form nature of user-generated data that raises several issues such as incorrect grammar, spelling mistakes, multilingual scripts, term obfuscation and usage of abbreviation and short-forms. In this dissertation, we present several techniques for data processing, text classification, and word obfuscation detection and information extraction for overcoming the noisy data problem. We also propose computational linguistic-based methods to address the challenges of ambiguity in the textual content. The central component of our proposed solution approach is the application of information retrieval and machine learning based techniques and algorithms. Our study consists of experimenting with a diverse range of machine learning algorithms such as unsupervised, semi-supervised and supervised learning (k-NN, SVM, Naive Bayes, Random Forest and Decision Tree) based algorithms. We also employ several ensemble learning based technique to improve the accuracy and performance of the baseline statistical models. We make the processed dataset used in our experiments publicly available for other researchers to replicate our experiments and benchmark against our proposed techniques. Data visualization is one of the major components of data analysis and interpretation. The study employs several basic and advanced data visualization techniques to present information in an intuitive manner to the end user. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Social media analytics en_US
dc.subject Open source social media intelligence en_US
dc.subject Text analytics modeling en_US
dc.subject User-generated data en_US
dc.subject Complaints and grievances en_US
dc.subject Hate and extremism promotion en_US
dc.subject Religious beliefs and conflicts en_US
dc.subject Civil unrest and protest en_US
dc.subject Secret message communication en_US
dc.title Open source social media as sensors for enabling government identification, prediction and response applications en_US
dc.type Thesis en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository

Advanced Search


My Account