Abstract:
Curbing hate speech is undoubtedly a major challenge for online microblogging platforms like Twitter. While there have been studies around hate speech detection, it is not clear how hate speech finds its way into an online discussion. In this thesis, we define a novel problem – given a source tweet and a few of its initial replies, the task is to forecast the hate intensity of upcoming replies. To this end, we curate a novel dataset comprising the entire reply chains of a ∼ 4.5k root tweets catalogued into four controversial topics. Our preliminary analysis confirms that the evolution patterns along time of hate intensity among reply chains have highly diverse patterns, and there is no significant correlation between the hate intensity of the source tweets and that of their reply chains. We notice a handful of such cases where despite the root tweets being non-hateful, the succeeding replies inject an enormous amount of toxicity into the discussions. We employ several state-of-the-art dynamic models and show that they fail to forecast the hate intensity. We then propose DESSERT a novel deep state-space model trained in real time. The model leverages the function approximation capability of deep neural networks with the capacity to quantify the uncertainty of statistical signal processing models. We observe that the DESSERT outperforms all the baselines across four evaluation metrics (both correlation-based and error-based). This model achieves 0.67 Pearson’s r and 31.08 Mean Absolute Percentage Error (MAPE), which is significantly better than the best baseline (r=0.557, MAPE=43.47). In addition to this, we also address this problem through a deep stratified learning framework DRAGNET˙It groups hate intensity profiles of reply chains into clusters used to formulate prior knowledge, which is employed to predict hate intensity of upcoming replies for a new reply chain. The DRAGNET turns out to be highly effective, significantly outperforming six baselines. It beats the best baseline with an increase of 9.4% in the Pearson correlation coefficient and a decrease of 19% in Root Mean Square Error. Further, both the models’ deployment in an advanced AI platform designed to monitor real-world problematic hateful content has improved the aggregated insights extracted for countering the spread of online harms.