Abstract:
There has been a proliferation of Web 2.0 sites on the Internet. Contemporary Web 2.0 sites
like Facebook and Twitter are primarily driven by user generated content (UGC). On the other
hand, the volume of content by such user contributions has been increasing rapidly. However,
user generated content may not conform to the set of guidelines and rules of the websites. Sub-par
content can severely a ffect user engagement, retention and also have an adverse impact on
information retrieval systems. Therefore, there is an impending need to manage and enhance
content quality on Web 2.0 sites. In this thesis, we investigate three broad objectives – (1) Low
quality content, (2) Content Quality Systems and (3) Information Retrieval Enhancement. In
order to address these objectives, first - we look at low quality questions on a popular programming
based community based question answering (CQA) website called Stackoverflow.
We analyze user behavior, content patterns and also build supervised machine learning based
predictive systems to detect low quality questions. In context to the second objective, we look
at enhancing content quality on Issue Tracking Systems - a popular artifact used by developers
during the software maintenance lifecycle. We conduct surveys from software practitioners to
understand the needs of the community and discover that developers frequently use the Internet
for their daily tasks. In order to reduce the context switch for software maintenance professionals,
we develop two systems – (i) CQA integration with Issue Tracking Systems and (ii) Web
Reference Management Browser Plugin. We develop both these systems to reduce cognitive
load on software maintenance professionals during their daily tasks. In context to the third objective,
we look at quality enhancement on social media to help information retrieval systems.
Concretely, we propose a new algorithm to utilize social interactions to discover homogeneous
topic-based communities on a social network. To address the challenge of scalability, our algorithm
only visits required portions of the network based on an expectation-maximization
approach. Further, we also propose an algorithm for tag recommendation on social media.
Specifically, we utilize Twitter to suggest tags to external linked media like Flickr, Youtube
and Soundcloud. In conclusion, we look at diff errant perspectives for quality analysis, systems,
detection and enhancement of content on Web 2.0 sites.