Abstract:
This year (2014) in the month of May, the tenure of the 15th Lok Sabha was to end and the
elections to the 543 parliamentary seats were to be held. With 813 million registered voters, out
of which a 100 million were rst time voters, India is the world's largest democracy. A whooping
$5 billion were spent on these elections, which made us stand second only to the US Presidential
elections ($7 billion) in terms of money spent. The di erent phases of elections were held on
9 days spanning over the months of April and May, making it the most elaborate exercise to
choose the Prime Minister of India. Swelling number of Internet users and Online Social Media
(OSM) users turned these unconventional media platforms into key medium in these elections;
that could e ect 3-4% of urban population votes as per a report of IAMAI (Internet & Mobile
Association of India). Political parties making use of Google+ Hangout to interact with people
and party workers, posting campaigning photos on Instagram and videos on YouTube, debating
on Twitter and Facebook were strong indicators of the impact of the OSM on the General
Elections 2014. With hardly any political leader or party not having his account on the micro
blogging site Twitter and the surge in the political conversations on Twitter, inspired us to take
the opportunity to study and analyze this huge ocean of elections data. Our count of tweets
related to elections from September 2013 to May 2014, collected with the help of Twitter's
Streaming API was close to 18.21 million.
We analyzed the complete dataset to nd interesting patterns in it and also to verify if the trivial
things were also evident in the data collected. We found that the activity on Twitter peaked
during important events related to elections. It was evident from our data that the political
behavior of the politicians a ected their followers count and thus popularity on Twitter. Yet
another aim of our work was to nd an e cient way to classify the political orientation of the
users on Twitter. To accomplish this task, we used four di erent techniques: two were based
on the content of the tweets made by the user, one on the user based features and another one
based on community detection algorithm on the retweet and user mention networks. We found
that the community detection algorithm worked best with an e ciency of more than 80%. It
was also seen that the content based methods did not fare well in the classi cation results. With
an aim to monitor the daily incoming data, we built a portal to show the analysis of the tweets
of the last 24 hours.1 This portal analyzed the tweets to nd the most trending topics, hashtags,
the kind of sentiments received by the parties, location of the tweets and also monitored the
popularity of various political leaders and their parties' accounts on Twitter. To the best of our
knowledge, this is the rst academic pursuit to analyze the elections data and classify the users
in the India General Elections 2014.