Year-2019

Year-2019 http://repository.iiitd.edu.in/xmlui/handle/123456789/801 Fri, 24 Jul 2026 07:29:48 GMT 2026-07-24T07:29:48Z Real-time congestion detection using public transport data http://repository.iiitd.edu.in/xmlui/handle/123456789/863 Real-time congestion detection using public transport data Pandey, Yogesh; Goyal, Vikram (Advisor) Congestion is one of the biggest problems which affects life quality and has an impact on social and economic conditions. Tackling traffic congestion has always been a challenge. The situation is more aggravated for developing countries, like India, due to their huge population and overutilization of basic resources. Existing solutions in terms of policies like the construction of new flyovers, underpasses, widening of roads have failed miserably. Prevalent technical solutions for congestion detection like deployment of the camera, RFID and other sensors are quite popular but they are more popular in developed countries. Developing countries are not much readily involved in such strategies as they require economic contributions and maintenance. Also web mapping service providers like Google, Bing, HERE uses crowdsourced information of mobile devices through their applications. Such kind of private data is not available to the government agencies which they can utilize to improve their transport system and solve congestion problems. So we study the effectiveness of congestion detection if only public transport data is available. We investigate the utility of the real-time bus Spatio-temporal data, which is sparse and has missing values for the task of congestion detection. Such a system if works with good accuracy, it would make government authorities not to depend on private players. We provide a real-time congestion detection mechanism that exploits GPS sensors installed on Delhi’s DIMTS cluster buses to provide fast & reliable congestion status. We compare multiple strategies and observe 70% f1 score and 80% recall at best by a simple statistical-based method. We also analyze this data for the application of hotspot detection and identifying popular bus stops. Sun, 01 Dec 2019 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/863 2019-12-01T00:00:00Z Content moderation across multiple platforms with capsule networks and co-training http://repository.iiitd.edu.in/xmlui/handle/123456789/809 Content moderation across multiple platforms with capsule networks and co-training Agarwal, Vani; Buduru, Arun Balaji (Advisor); Kumaraguru, Ponnurangam (Advisor) Social media systems provide a platform for users to freely express their thoughts and opinions. Although this property represents incredible and unique communication opportunities, it also brings along important challenges. Often, content which constitutes hate speech, abuse, harmful intent proliferates online platforms. Since problematic content reduces the health of a platform and negatively aﬀects user experience, communities have terms of usage or community norms in place, which when violated by a user, leads to moderation action on that user by the platform. Unfortunately, the scale at which these platforms operate makes manual content moderation near impossible, leading to the need for automated or semi-automated content moderation systems. For understanding the prevalence and impact of such content, there are multiple methods including supervised machine learning and deep learning models. Despite the vast interest in the theme and wide popularity of some methods, it is unclear which model is most suitable for a certain platform since there have been few benchmarking eﬀorts for moderated content. To that end, we compare existing approaches used for automatic moderation of multimodal content on ﬁve online platforms: Twitter, Reddit, Wikipedia, Quora, Whisper. In addition to investigating existing approaches, we propose a novel Capsule Network based method that performs better due to its ability to understand hierarchical patterns. In practical scenarios, labeling large scale data for training new models for a diﬀerent domain or platform is a cumbersome task. Therefore we enrich our existing pre-trained model with a minimal number of labeled examples from a diﬀerent domain to create a co-trained model for the new domain. We perform a cross-platform analysis using diﬀerent models to identify which model is better. Finally, we analyze all methods, both qualitatively and quantitatively, to gain a deeper understanding of model performance, concluding that our method shows an increase of 10% in average precision. We also ﬁnd that the co-trained models perform well despite having less training data and may be considered a cost-eﬀective solution. Wed, 01 May 2019 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/809 2019-05-01T00:00:00Z Detecting fake proﬁles on online Matrimony http://repository.iiitd.edu.in/xmlui/handle/123456789/808 Detecting fake proﬁles on online Matrimony Garg, Vaibhav; Kumaraguru, Ponnurangam (Advisor); Buduru, Arun Balaji (Advisor); Asthana, Siddhartha (Advisor) In a diverse country like India, socio-economic factors like religion, caste, language, income along with other common physical, professional based factors, play a vital role while searching for a spouse. With the surge of Internet connectivity, online matrimonial websites have become hugely popular to cater such needs. Most of the users registered on these portals have genuine intention of ﬁnding their desired life partner, however due to various factors, it attracts few proﬁles with no genuine intention for the same. Such proﬁles are also known as fake proﬁles. These proﬁles lead to bad user experience as well as revenue loss for the online matrimony business. To dig into this problem, we have chosen a use case of India’s leading matrimony site and studied the behaviour, edit and proﬁle diﬀerences between fake and genuine accounts. In this thesis, we present a machine learning based approach to identify such fake proﬁles on online matrimony. Due to lack of labelled examples for in-genuine users, we solve the above problem as anomaly detection problem. In this thesis, we use autoencoder which is widely used algorithm for anomaly detection. We capture user’s behaviour, proﬁle information and edit history to predict him/her as in-genuine or genuine proﬁle. We then treat this problem as a reconstruction task using autoencoder which is trained on a set of genuine proﬁles features. While prediction, the autoencoder shows small reconstruction error for genuine proﬁles and a very high reconstruction error for the fake proﬁles and detect them. The proposed system produces 91.76% accuracy with 90.2% recall for fake class. To the best of our knowledge, this is the ﬁrst study done to detect fake proﬁles in online matrimony domain. Wed, 01 May 2019 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/808 2019-05-01T00:00:00Z Random forest of imputation trees (RITS) for sparse single cell genomics data http://repository.iiitd.edu.in/xmlui/handle/123456789/807 Random forest of imputation trees (RITS) for sparse single cell genomics data Sharma, Rachesh; Majumdar, Angshul (Advisor); Kumar, Vibhor (Advisor) A human body has billions of cells specialized with their own function and each cell carries genome in its nucleus. The activity of the genome is controlled by a multitude of molecular complexes called as epigenome. Previously scientists had a notion that human diseases are caused only due to changes in the DNA sequence or through the infectious agents present in the environment. However, recent studies have revealed that changes in the epigenome are also associated with disease. Our aims is to create an imputation method for noisy, sparse and highly unbalanced single cell epigenome data. This problem is challenging as there is no imputation method for imputing huge and unbalanced dataset of single cell epigenome. Moreover, its analysis holds a significant amount of importance in the biological domain for preventing and curing many critical diseases. Here we propose an imputation method called as RITs for imputing single cell epigenome profiles. We evaluated our proposed method through various possible techniques and compared its results with traditional imputation methods, although those imputation methods were made for imputing gene expression data. Our proposed method out-performs in every test and comes out as reliable imputation method even when we have huge unbalanced data. We tested our method on scATAC-seq dataset of cells from organs of the adult mouse to check the robustness and efficiency of this method. In all the conditions and tests, our imputation methods RITS remained at the top. The generality of RITs and it robustness for very noisy and sparse data-sets hints that it is the next generation imputation method for single cell profiles. Mon, 01 Apr 2019 00:00:00 GMT http://repository.iiitd.edu.in/xmlui/handle/123456789/807 2019-04-01T00:00:00Z