Content moderation across multiple platforms with capsule networks and co-training

Agarwal, Vani; Buduru, Arun Balaji (Advisor); Kumaraguru, Ponnurangam (Advisor)

dc.contributor.author	Agarwal, Vani
dc.contributor.author	Buduru, Arun Balaji (Advisor)
dc.contributor.author	Kumaraguru, Ponnurangam (Advisor)
dc.date.accessioned	2020-05-31T15:14:16Z
dc.date.available	2020-05-31T15:14:16Z
dc.date.issued	2019-05
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/809
dc.description.abstract	Social media systems provide a platform for users to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Often, content which constitutes hate speech, abuse, harmful intent proliferates online platforms. Since problematic content reduces the health of a platform and negatively aﬀects user experience, communities have terms of usage or community norms in place, which when violated by a user, leads to moderation action on that user by the platform. Unfortunately, the scale at which these platforms operate makes manual content moderation near impossible, leading to the need for automated or semi-automated content moderation systems. For understanding the prevalence and impact of such content, there are multiple methods including supervised machine learning and deep learning models. Despite the vast interest in the theme and wide popularity of some methods, it is unclear which model is most suitable for a certain platform since there have been few benchmarking eﬀorts for moderated content. To that end, we compare existing approaches used for automatic moderation of multimodal content on ﬁve online platforms: Twitter, Reddit, Wikipedia, Quora, Whisper. In addition to investigating existing approaches, we propose a novel Capsule Network based method that performs better due to its ability to understand hierarchical patterns. In practical scenarios, labeling large scale data for training new models for a diﬀerent domain or platform is a cumbersome task. Therefore we enrich our existing pre-trained model with a minimal number of labeled examples from a diﬀerent domain to create a co-trained model for the new domain. We perform a cross-platform analysis using diﬀerent models to identify which model is better. Finally, we analyze all methods, both qualitatively and quantitatively, to gain a deeper understanding of model performance, concluding that our method shows an increase of 10% in average precision. We also ﬁnd that the co-trained models perform well despite having less training data and may be considered a cost-eﬀective solution.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Content moderation	en_US
dc.subject	Capsule network	en_US
dc.subject	Co-training	en_US
dc.title	Content moderation across multiple platforms with capsule networks and co-training	en_US
dc.type	Thesis	en_US