Open source social media as sensors for enabling government identification, prediction and response applications

Agarwal, Swati; Sureka, Ashish (Advisor); Goyal, Vikram (Advisor)

Home
→
Computer Science and Engineering
→
PhD Theses
→
Year-2017
→
View Item

dc.contributor.author	Agarwal, Swati
dc.contributor.author	Sureka, Ashish (Advisor)
dc.contributor.author	Goyal, Vikram (Advisor)
dc.date.accessioned	2017-06-27T03:52:34Z
dc.date.available	2017-06-27T03:52:34Z
dc.date.issued	2017-06
dc.identifier.uri	http://hdl.handle.net/123456789/507
dc.description.abstract	Online Social media platforms such as Tumblr, Twitter (micro-blogging website) and YouTube (video sharing website) contains information which is publicly available or open-source. Open source social media intelligence (OSSMInt) is a field comprising of techniques and applications to analyze and mine open-source social media data for extracting actionable information and useful insights. The focus of the work presented in this dissertation is on novel applications and techniques of OSSMInt in the government sector. We propose and develop several novel usage scenarios and applications around OSSMInt for government and broadly divide them into three categories: identification, prediction, and response applications. In particular, we present solutions, tools and techniques for analyzing data from micro-blogging website to analyze citizen complaints and grievances in the public sector [response]. The research presented in this dissertation also describes our work on analyzing data from Twitter micro-blogging website to early forecast a civil unrest and protest [prediction]. Furthermore, we build various applications around identification and detection that are useful for the government and security analysts. We demonstrate the application of OSSMInt for identifying religious conflicts within society by mining public opinions on Tumblr website and fill the gaps of offline surveys. The study presented in this dissertation propose solutions for enabling law enforcement agencies to detect, prevent and combat online radicalization and extremism (content, users, and communities) by mining data from Tumblr, Twitter and YouTube [identification]. We also propose to use the deep natural language processing analysis based techniques for automatic identification of racist and radicalized posts based on the intent of the author. Furthermore, we also propose and build an application for detecting secret message exchanged in an adversarial communication and capture the obfuscated terms in messages. It is technically challenging to analyze social media content due to the free-form nature of user-generated data that raises several issues such as incorrect grammar, spelling mistakes, multilingual scripts, term obfuscation and usage of abbreviation and short-forms. In this dissertation, we present several techniques for data processing, text classification, and word obfuscation detection and information extraction for overcoming the noisy data problem. We also propose computational linguistic-based methods to address the challenges of ambiguity in the textual content. The central component of our proposed solution approach is the application of information retrieval and machine learning based techniques and algorithms. Our study consists of experimenting with a diverse range of machine learning algorithms such as unsupervised, semi-supervised and supervised learning (k-NN, SVM, Naive Bayes, Random Forest and Decision Tree) based algorithms. We also employ several ensemble learning based technique to improve the accuracy and performance of the baseline statistical models. We make the processed dataset used in our experiments publicly available for other researchers to replicate our experiments and benchmark against our proposed techniques. Data visualization is one of the major components of data analysis and interpretation. The study employs several basic and advanced data visualization techniques to present information in an intuitive manner to the end user.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Social media analytics	en_US
dc.subject	Open source social media intelligence	en_US
dc.subject	Text analytics modeling	en_US
dc.subject	User-generated data	en_US
dc.subject	Complaints and grievances	en_US
dc.subject	Hate and extremism promotion	en_US
dc.subject	Religious beliefs and conflicts	en_US
dc.subject	Civil unrest and protest	en_US
dc.subject	Secret message communication	en_US
dc.title	Open source social media as sensors for enabling government identification, prediction and response applications	en_US
dc.type	Thesis	en_US