DSpace Collection:

DSpace Collection: http://repository.iiitd.edu.in/xmlui/handle/123456789/1051 2026-06-21T20:47:12Z Identify, inspect and intervene multimodal fake news http://repository.iiitd.edu.in/xmlui/handle/123456789/1653 Title: Identify, inspect and intervene multimodal fake news Authors: Singhal, Shivangi; Shah, Rajiv Ratn (Advisor); Kumaraguru, Ponnurangam (Advisor) Abstract: Fake news refers to intentionally and verifiably false stories created to manipulate people’s perceptions of reality. Fake news is destructive and has been used to influence voting decisions and spread hatred against religion, organizations or individuals, resulting in violence or even death. It has also become a method to stir up and intensify social conflict. Fake news has existed for a long time, but what led to the change was the rise of Web 2.0 technologies and social media, which broadened communication horizons. Social media emerged as a multidisciplinary tool for exchanging information and ideas. However, there are always two sides to a coin; social media is no exception. On the positive side, social media aid users in generating content that is a backbone for the masses to interact. The negative impact, however, is significantly more profound. First, the availability of the Internet and smart phones at nominal prices, in tandem with lowering entry barriers on such platforms, has given fake news a vast audience and allowed it to spread rapidly and widely. Second, social media platforms suffer from a lack of centralized gatekeeping to regulate the volume of generated content. As a result, online users fall prey to misleading stories. Individuals tend to accept information supporting their ideologies, preventing them from making rational decisions. Third, one can gain monetary benefits from such platforms by engaging the audience. Users are always drawn to sensational and controversial content. As a result, manipulators tend to generate fake news that receives a lot of attention and engagement and is more likely to be spread on such platforms. Therefore, it is essential to understand the nature of fake news spreading online, devise new technologies to combat it, analyze the current detection methods and improve intuitive understanding among online readers. Henceforth, this PhD thesis addresses three fundamental challenges. First, we focuses on devising different methods to Identify, a.k.a., detect fake news online by extracting different feature sets from the given information. By designing foundational detection mechanisms, our work accelerates research innovations. Second, our research closely Inspect the fake stories from two perspectives. First, from the information point of view, one can inspect fabricated content to identify the patterns of false stories disseminating over the web, the modality used to create the fabricated content and the platform used for dissemination. To study such changing dynamics of fake news, we select India as the region and built an extensive dataset to aid researchers in investigating such issues. Next, from the model point of view, we inspect detection mechanisms used in prior work and their generalizability to other datasets. The thesis also suggests Intervention techniques to help internet users broaden their comprehension of fake news. We discuss potential practical implications for social media platform owners and policymakers. We design different multimodal fake news detection baselines to answer the first part of the thesis. Typically, a news article consists of a headline, content, top image and other corresponding images. We begin by designing SpotFake- a multimodal framework for fake news detection. Our proposed solution identifies fake news without taking into account any additional subtasks. It exploits both the textual and visual features of an article. Specifically, we used language models (like BERT) to learn contextual representations for the text, and image features are learned from VGG-19 pre-trained on the ImageNet dataset. Our proposed method outperforms the baselines by a margin of 6% accuracy on average. Next, we present SpotFake+: A Multimodal Framework for Fake News Detection via Transfer Learning. It is a multimodal approach that leverages transfer learning to capture semantic and contextual information from the news articles and its associated images and achieve better performance for fake news detection. SpotFake+ is one of the first attempts that performs a multimodal approach for fake news detection on a dataset that consists of full-length articles. Next, we observed that most of the research on fake news has focused on detecting fake news by leveraging information from both modalities, ignoring the other multiple visual signals present in a news sample. To this, we created Inter-modality Discordance for Multimodal Fake News Detection. The proposed method leverages information from multiple images in tandem with the text modality to perform multimodal fake news detection. The count of images varies per sample basis, and our designed method can incorporate such changes efficiently. We adopt a multimodal discordance rationale for multimodal fake news detection. Our proposed model effectively captures the intra and inter-modality relationship between the different modalities. Lastly, we observed that existing research capture high-level information from different modalities and jointly models them to decide. Given multiple input modalities, we hypothesize that not all modalities may be equally responsible for decisionmaking. Hence, we present Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection. Here, we design a novel architecture that effectively identifies and suppresses information from weaker modalities and extracts relevant information from the strong modality on a per-sample basis. We also capture the intra-modality relationship that first generates fragments of a modality and then learn fine-grained salient representations from the fragments. In the first part of the thesis, we make numerous attempts to design methods that can effectively identify fake news. However, in the process, we observed that the results reported by state-of-the-art methods indicate achieving almost perfect performance. In contrast, such methods fail to cope with the changing dynamics of fake news. The reasons could be twofold. First, the issue can reside in the information itself; second, the designed method is incapable of extracting the informative signals. Hence, in the second part of the thesis, we inspect fake news from two perspectives. From an information viewpoint, we study the changing dynamics of fake news over time. We selected India as the region from which we could derive conclusions, as little effort was made to study the menace of fake news in India. To this end, we built an extensive dataset, FactDrill: A Data Repository of Fact-Checked Social Media Content to Study Fake News Incidents in India. Further, using the dataset, one can investigate the changing dynamics of fake news in a multi-lingual setting in India. The resource would aid in examining the fake news at its core, i.e. investigating the different kinds of stories being disseminated, the modalities or combinations used to create the fabricated content and the platform used for dissemination. From a model viewpoint, we examine the apparent discrepancy between current research and real applications. We hypothesize that the performance claims of the current state-of-the-art have become significantly overestimated. The overestimation might be due to the systematic biases in the dataset and models leveraging such preferences and not taking actual cues for detection. We conduct experiments to investigate the prior literature from the input data perspective, where we study statistical bias in the dataset. Our finding state that though reported performances are impressive, leveraging multiple modalities to detect fake news is far from solved. The final section of the thesis focuses on developing intervention strategies that enable readers to identify fake news. We design SachBoloPls- a system that validates news on Twitter in real-time. It is an effort to curb the proliferation of debunked fake news online, make audiences aware of fact-checking organizations, and educate them about false viral claims. There are three components of SachBoloPls that are independent and can be extended to other social media and instant messaging platforms like Instagram, WhatsApp, Facebook, and Telegram. The proposed prototype can also incorporate regional languages making it a viable tool to fight against fake news across India. Designing effective interventions can encourage social media users to exercise caution while reading or disseminating news online. Lastly, we discuss potential practical implications for social media platform owners and policymakers. 2023-06-01T00:00:00Z Leveraging machine learning in identification of biomarkers for cancer diagnosis and personalized therapy recommendation http://repository.iiitd.edu.in/xmlui/handle/123456789/1377 Title: Leveraging machine learning in identification of biomarkers for cancer diagnosis and personalized therapy recommendation Authors: Goswami, Chitrita; Sengupta, Debarka (Advisor) Abstract: In an era where machine learning (ML) is changing the landscape of financial markets, education, security and privacy, the retail sector, and many other crucial aspects of human life, it is only fitting that we should use its potential for personalized medicine. Combining precision medicine with statistical analysis and machine learning techniques may pave the future of disease treatment. Personalized, or precision, medicine consists of using knowledge specific to a patient, such as biomarkers, genomic information, demographics, or lifestyle characteristics, best to treat their ailment, rather than generic best practices. According to a given scenario, machine learning (ML) can help predict the best treatment plan for the patient. ML can help supply clinicians with highconfidence hypotheses to support the complex decision-making process on an individual basis. This system of assistance is called clinical decision support systems (CDSS). Because cancer is so heterogeneous in nature, it is essential that each patient’s treatment be individually tailored and targeted rather than adopting a standard system. Some key aspects of clinical decision-making are improving treatment efficiency, reducing adverse effects, lowering patient and care providers’ costs, and diagnosing the disease early. To study, design, analyze and interpret such multidisciplinary aspects of clinical and translational cancer research, we drew on both statistical and machine-learning based methods. Below is an anecdote of our key contributions that successfully incorporate machine learning, genomics and patient-focused healthcare. We start the journey at the cellular level, where we propose a method that can help reveal the factors contributing to cellular heterogeneity in single-cell datasets. By identifying influential genes that contribute to cellular heterogeneity, our proposed method InGene lays the groundwork for personalized medicine. Single-cell RNA sequencing (scRNA-seq) provides a powerful means of characterizing transcriptional heterogeneity within cells of seemingly identical phenotypes. Due to factors like high variability in scRNA-seq data, high dimensionality, and sparsity, traditional feature selection methods fall short in this task. Recently, non-linear dimensionality reduction techniques have made foray into scRNA-seq as they help us assess local and global cellular arrangement. However, non-linear dimensionality reduction techniques are primarily used for visualization purposes only since they do not shed any light on the individual genes’ identity that influences the non-linear transformation. We developed InGene, a first of its kind non-linear unsupervised method to overcome this limitation. Our method can also be used as an alternative to state-of-the-art methods for finding differential genes, which can be further used as a targeted sequencing panel, thus aiding in clinical decision making. InGene can be used to obtain reliable targeted panels for scRNA-sequencing, thus reducing the cost manifold. Using a cost-effective scRNA-seq sequencing solution can prove to be a headway in personalized therapy recommendation and help make the clinical decision making process more effective. Next, we expand the scope from cellular insights to a broader patient-centric approach. In the realm of oncology, there is a critical need for diagnostic methodologies that are both efficacious and patient-friendly. In this chapter, we contribute to improving cancer diagnosis. Our study proposes an affordable, non-invasive, liquid-biopsy based diagnostic method. Although tissue biopsy is widely used to diagnose cancer, it has drawbacks, particularly when repeated sampling is necessary. Due to their ability to precisely identify the existence and subtype of tumours, tumour educated platelets (TEPs) have recently attracted interest. The majority of research involving TEPs has utilized marker-panels that include hundreds of genes, which can be expensive and impede the adoption of the diagnostic method. To address this issue, we investigated TEP expression profiles that are available to the public and discovered a signature of 11 platelet-genes that can effectively differentiate between malignant and normal samples. Next, in our journey, we foray from disease detection to disease management. We propose to enhance patient outcomes for Multiple Myeloma (MM) patients and help clinicians optimize a patient’s treatment plans. Patient stratification and prediction of disease recurrence is another important aspect of personalized therapy. To determine the probability of recurrence in MM patients receiving Autologous Stem Cell Transplantation (ASCT), we developed a stratification model to enhance prognosis estimate and treatment efficacy. For a lot of practical reasons, it is crucial to identify whether a patient undergoing ASCT is at high risk for recurrence (likely to relapse within 36 months). Our model, which consists of a 3-factor multivariate 2-stage staging system, is highly decisive in predicting the outcome of stem cell rescue. It is essential to detect cancer promptly in order to manage cancer patients effectively. In conclusion, this thesis harmonizes molecular insights, diagnostic innovations and clinical management in oncology. 2023-02-01T00:00:00Z Algorithms for spatial colocation pattern mining http://repository.iiitd.edu.in/xmlui/handle/123456789/1314 Title: Algorithms for spatial colocation pattern mining Authors: Baride, Srikanth; Goyal, Vikram (Advisor) Abstract: Spatial data mining is a specialized field that focuses on extracting meaningful insights and patterns from geographical or spatial data. One particular area of interest in spatial data mining is colocation pattern mining. Colocation patterns refer to objects or entities that tend to occur frequently in close spatial proximity to each other. These patterns can provide valuable insights into spatial relationships and dependencies. Traditional colocation mining algorithms typically operate on static data and re-quire a predefined single-distance threshold to determine spatial proximity. However, deciding on a suitable threshold can be challenging and may not capture the full range of interesting patterns. Moreover, processing the graph representation of spatial data and handling dynamic or evolving datasets present additional challenges in colocation pattern mining. To address these challenges, our work introduces several novel approaches. Firstly, we propose a new colocation query called Range colocation mining. This query enables the computation of colocation patterns over a range of distances, rather than relying on a single threshold value. This provides greater flexibility to analysts when the de-termination of a specific distance threshold is difficult or uncertain. Unlike classical algorithms that compute patterns separately for each distance threshold, our method efficiently computes patterns in a single scan over the spatial data, ensuring scalability. In addition, we extend the traditional notion of colocation patterns beyond cliques to any subgraph representation. This notion allows for a broader exploration of patterns and considers the edges’ labels and the degree of affinity between objects. We analyze the complexity of mining subgraph colocation patterns and propose a novel query for high-utility subgraph (colocation) pattern mining. The problem turns out to be more complex than the classical colocation pattern mining. Leveraging the power of Apache Spark, our solution employs a set of heuristics to traverse the pattern space efficiently, utilizing an anti-monotonic relationship over utility values. Our proposed approach is scalable and aids in discovering interesting subgraph patterns prevalent across a set of disjoint regions. 2023-12-01T00:00:00Z Deep clustering http://repository.iiitd.edu.in/xmlui/handle/123456789/1313 Title: Deep clustering Authors: Goel, Anurag; Majumdar, Angshul (Advisor) Abstract: The traditional way of clustering is first extracting the feature vectors according to domain-specific knowledge and then employing a clustering algorithm on the extracted features. Deep learning approaches attempt to combine feature learning and clustering into a unified framework which can directly cluster original images with even higher performance. Therefore, deep clustering approaches rely on deep neural networks for learning high-level representations for clustering. Auto-encoders are a special instance of deep neural networks which are able to learn representations in a fully unsupervised way. Majority of the prior works on deep clustering are based on auto-encoder framework where the clustering loss is embedded into the deepest layer of an auto-encoder. The problem with auto-encoder is that they require training an encoder and a decoder network. The clustering loss is incorporated after the encoder network; the decoder network is not relevant for clustering. The need of learning an encoder and a decoder network leads to learning twice the number of parameters as that of a standard neural network. This may lead to overfitting especially in the cases where the number of data instances are limited. Moreover, the current state-of-the-art deep clustering approaches are not able to capture the discriminative information in the learned representations due to the lack of supervision [1]. To alleviate the aforementioned problems, we have proposed deep clustering approaches based on Dictionary Learning, Transform Learning, and Convolutional Transform Learning (CTL) frameworks. We have embedded two popular clustering algorithms – K-means clustering and Sparse Subspace clustering. The limitation of unsupervised learning in existing deep clustering approaches is mitigated by incorporating contrastive learning in CTL framework. The proposed deep clustering approaches are evaluated using datasets from multiple domains including computer vision, hyperspectral imaging, text and multiview datasets. The results demonstrate the superiority of the proposed approaches over the current state-of-the-art deep clustering approaches. 2023-11-01T00:00:00Z