Year-2020

Year-2020 http://repository.iiitd.edu.in/xmlui/handle/123456789/826 2026-05-27T21:01:27Z 2026-05-27T21:01:27Z A quest for cyber high ground : impact of internet structure on (anti-) censorship Gosain, Devashish Chakravarty, Sambuddho (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/936 2021-09-10T04:54:33Z 2020-07-01T00:00:00Z

A quest for cyber high ground : impact of internet structure on (anti-) censorship Gosain, Devashish; Chakravarty, Sambuddho (Advisor) The original design of the Internet was a resilient, distributed architecture, that should be able to route around (and therefore recover from) massive disruption — up to and including nuclear war. However, network routing policies and business decisions cause traffic to be often routed through a relatively small set of Autonomous Systems (ASes). It has practical implications — some of these frequently appearing ASes (i.e., “key” ASes) are hosted in censorious nations. Other than censoring their own citizens’ network access, such ASes may inadvertently filter traffic for other foreign customer ASes as well. Thus in this thesis, we analyzed, “how can inferences drawn from Internet maps be used to aid (Anti-)censorship?” Specifically, we attempted to answer questions like — Is Internet structure still hierarchical? What are the key ASes, and in which countries they are located? Can censorious countries (which may host key ASes) filter Internet traffic of other nations transiting them? To begin with, we constructed a map of the Internet and examined the extent of routing centralization on the Internet. We identified the major players who control the “Internet back-bone” and point out how many of these are, in fact, under the jurisdiction of censorious countries (specifically Russia, China, and India). We found that approx. one-third of the Internet backbone belongs to the aforementioned known censors that may potentially monitor a large fraction of global Internet traffic. Further, we went ahead to study whether this hierarchy exists within the nation(s) itself? If so, can censorious nations exploit this hierarchy to achieve censorship/- surveillance within their national boundary? With censorship mechanisms deployed in a few key ASes, a censor may achieve large scale censorship within its territory. As a case study, we selected India that has the second-largest Internet user base. We conducted a study on the Internet hierarchy in India from the point of view of the censor. We then consider the question (feasibility) of whether India might potentially follow the Chinese model and institute a single, government-controlled filter. We found that a few “key” ASes (1% of Indian ASes) collectively intercept 95% of paths to the censored sites, and also to all publicly-visible DNS servers. Five thousand routers spanning these key ASes would suffice to carry out IP or DNS filtering for the entire country; 70% of these routers belong to only two private ISPs. However, the previous feasibility study does not consider the present censorship mechanisms and infrastructure employed by Indian ISPs. Thus, we developed various techniques and heuristics to assess the pervasiveness of censorship and study the underlying mechanisms used by these ISPs to achieve them. We fortified our findings by adjudging the coverage and consistency of censorship infrastructure, broadly in terms of the average number of network paths and requested domains, the infrastructure censors. Our results indicate an apparent disparity among the ISPs — what they filter and on how they install censorship infrastructure. For instance, in Idea cellular (a popular ISP), we observed the censorious middle boxes in over 90% of our tested intra-AS paths, whereas for others like Vodafone, it is as low as 2.5%. We later devised novel anti-censorship strategies that do not depend on third-party tools (like proxies, Tor, and VPNs, etc.). We managed to access all blocked websites in all ISPs under test. It must be noted that the proposed anti-censorship solutions were temporary, i.e., they were based on obfuscating the pattern matching used by the censorship infrastructure (e.g., in HTTP GET request changing the Host: evil.com to HoSt: evil.com). If in the future, censors evolve and improve their infrastructure, the proposed solutions may likely fail. Thus, we focused on a relatively new anti-censorship scheme Decoy Routing, which aims to end the arms race between the censor and the free speech activists. Decoy Routing, the use of routers (rather than end hosts) as proxies, is a new direction in anti-censorship research. However, practical decoy routing deployment poses a new challenge of where to place decoy Routers (DRs) on the Internet? Thus, we proposed an efficient decoy router placement strategy that requires the construction of global (and country-level) Internet maps. We found that few (≈30) ASes intercepted over 90% of paths to the top n sites worldwide, for n = 10, 20...200 and also to other destinations. Our first contribution is to demonstrate with real paths that the number of ASes required for a world-wide DR framework is small (≈30). Our second contribution is to consider the details of DR placement — not just in which ASes DRs should be placed to intercept traffic, but exactly where in each AS. We found that even with 30 ASes, we still need a total of about 11,700 DRs. Decoy Routing requires accessing web content hosted outside the censors’ boundary. However, Content Distribution Network (CDNs), which are designed to bring web content closer to end-user, might pose operational challenges to DR. Popular web content (e.g., Alexa popular websites) served from CDNs, might be available within the censor’s boundary itself. Thus, we analyzed how do CDN-based web content localization can hinder such systems. Moreover, we quantitatively analyzed the impact of CDN localization on various anti-censorship systems, including DR. Such analysis requires geolocating the websites. Thus we adapted a multilateration method, Constraint Based Geolocation (CBG), with novel heuristics and termed it as Region Specific CBG (R-CBG). In ≈91% cases, R-CBG correctly classifies hosts as inside (or outside) w.r.t. a nation. Using R-CBG, we observe that most of the popular websites are hosted inside each of the nations. Our empirical study, involving five countries, shows that popular websites (≈ 80% of Alexa top-1k for each nation) are hosted within a client’s domicile. These results reveal that anti-censorship approaches like DR may not directly use a significant fraction of popular websites. However, a small, yet a significant set of websites (≈20%), are hosted outside the censors’ boundaries and may be used.

2020-07-01T00:00:00Z Authenticated encryption for memory constrained devices Agrawal, Megha Chang, Donghoon (Advisor) Sanadhya, Somitra Kumar (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/880 2021-12-13T10:16:34Z 2020-09-01T00:00:00Z

Authenticated encryption for memory constrained devices Agrawal, Megha; Chang, Donghoon (Advisor); Sanadhya, Somitra Kumar (Advisor) It is common knowledge that encryption is a useful tool for providing confidentiality. Authentication, however, is often overlooked. Authentication provides data integrity; it helps ensure that any tampering with or corruption of data is detected. It also provides assurance of message origin. Authenticated encryption (AE) algorithms provide both confidentiality and integrity / authenticity by processing plaintext and producing both cipher text and a Message Authentication Code (MAC). It has been shown too many times throughout history that encryption without authentication is generally insecure. This has recently culminated in a push for new authenticated encryption algorithms. There are several authenticated encryption algorithms in existence already. However, these algorithms are sometimes difficult to use in resource constrained environment. This thesis focuses on designing of authenticated encryption schemes suitable for memory constrained environment. In many practical applications, the users of an AE scheme use a cryptographic module to perform encryption, decryption and tag verification. Usually this cryptographic module has a very small memory. Due to its limited storage, it can’t store the complete ciphertext to first verify the tag and then conditionally decrypt it. Similarly, it can’t store the complete plaintext while decrypting, and output it only if the tag is valid. This becomes an issue which is particularly relevant in the case of long messages. In authenticated encryption schemes, there are two techniques for handling long ciphertexts while working within the constraints of a low buffer size: Releasing unverified plaintext (RUP) or Producing intermediate tags (PIT). In this work, in addition to the two techniques, we propose another way to handle a long ciphertext with a low buffer size by storing and releasing only one (generally, or only few) intermediate state without releasing or storing any part of an unverified plaintext and without need of generating any intermediate tag. In this context we have designed two schemes sp-AELM which is sponge based and dAELM which is deterministic AE scheme. Brief details about these work are given below. 1. sp-AELM is a sponge based authenticated encryption scheme that provides support for memory constrined devices. We also provide its security proof for privacy and authenticity in an ideal permutation model, using a code based game playing framework. Furthermore, we also present two more variants of sp-AELM that serve the same purpose and are more efficient than sp-AELM. 2. We also analyzed sponge based CAESAR submissions using our proposed technique, to determine their potential to support limited memory constraint. 3. dAELM is a deterministic authenticated encryption scheme providing support for memory constrained device. Deterministic AE (DAE) is used in domains such as the key wrap, where the available message entropy omits the overhead of a nonce. For limiting memory usage, our idea is to use a session key to encrypt a message and share the session key with the user depending upon the verification of a tag. We provide the security proof of the proposed construction in the ideal cipher model. 4. We have shown a simple PA attack on an existing SAEB authenticated encryption scheme. Further, we have proposed a modification to SAEB to overcome this attack. We have proposed two modified versions of SAEB called RSAEB v1 and RSAEB v2: first one provides PA1 security in nonce respecting scenario and another one providesPA1 in nonce misuse scenario and PA2. PA2 is stronger security notion than PA1, which comes at the cost of an additional pass.

2020-09-01T00:00:00Z Panoptic defenses for secure computer vision Agarwal, Akshay Singh, Richa (Advisor) Vatsa, Mayank (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/839 2021-12-13T10:23:40Z 2020-10-01T00:00:00Z

Panoptic defenses for secure computer vision Agarwal, Akshay; Singh, Richa (Advisor); Vatsa, Mayank (Advisor) As the deployment and usage of computer vision systems increase, protecting these systems from malicious data has also become a critical task. The primary source of information in any computer vision system is the input data, and authenticity of the data is integral to the reliability of a system. With advancements in electronic equipment, especially communication mediums such as mobile phones and laptops, digital data acquisition has become an easy task. Such huge enablements of the cameras and digital contents have raised severe concerns such as capturing unauthorized biometrics data, video voyeurism, and sexting. Apart from that, in the case of person recognition, it is generally seen that when the testing image is captured using the different sensor/camera, the performance significantly drops. In other vital scenarios, digital images are used as evidence in the court of law and criminal investigation. While the image source might be authentic, the image itself might be a spoof or corrupted in a way to fool the machine learning algorithms. The attacks on computer vision algorithms have become advanced enough to trick the machine learning systems and deceive human visual systems. Therefore, proper authentication of digital images and videos is necessary. While many of these challenges of computer vision systems are dealt with individually, this dissertation provides a ‘panoptic’ view to address the challenges ranging from image source identification to the classification of anomalies, using machine learning algorithms. This dissertation focuses on detecting and mitigating the spectrum of attacks on the data level. The four major contributions are (i) sensor identification to ascertain that the image is captured from an authenticated device, (ii) detecting digital attacks, (iii) detecting physical attacks, and (iv) detecting adversarial attacks. In the case of large human identification projects such as India’s Aadhaar project and Integrated Automatic Fingerprint Identification System (IAFIS) of the FBI, a variety of acquisitions devices are used. While it is important to ensure that the images are captured from authenticated devices only, the images captured from these different devices vary significantly in terms of the quality, texture, and illumination, which makes the matching of these images also a challenging task. As the first contribution, we have proposed a camera source identification algorithm and a novel feature selection algorithm to identify the biometric image sensor used for acquisition. The proposed algorithm yields more than 99% classification accuracy on several databases with images captured using multiple cameras. We have also prepared and released two multi-sensor iris databases to promote research on this problem. The next two problems we have addressed in this dissertation are presentation attacks on face recognition systems, through physical presentation attacks and digital attacks such as morphing. A variety of presentation attack instruments have been used, starting from the simple print and replay attacks, to more sophisticated mediums such as silicone masks, latex masks, or wax faces. The proposed presentation attack detection algorithm utilizes a combination of wavelet decomposition and texture feature extraction with support vector machine classifier to distinguish between real and attacked faces. The proposed algorithm outperforms state-of-the-art algorithms, including classifiers based on hand-crafted image features and deep CNN features under several generalized settings, including multiple spectrum. We have also prepared a multi-spectral (i.e., visible, nearinfrared, and thermal) face presentation attack database. It is one of the largest publicly available databases in the physical presentation attack domain. The second contribution focuses on detecting digital manipulations such as morphing and swapping. Morphing is the technique to blend two or more faces to create one morphed image, which can be used to create a duplicate identity, and two individuals can get authorized access using the same identity. We first prepared a large scale database using multiple images collected from multiple mediums such as mobile applications and internet websites. We propose a novel feature extraction algorithm to detect the digital alterations that can encode the artifacts developed due to morphing or swapping. The proposed feature extraction algorithm first filter the image patches and encodes the irregularities as a difference in those local regions. We have observed that because of the sophisticated digital alteration tools, these differences are minute. Therefore, to highlight the irregularities, we assign the weights to the difference value based on its magnitude. Once the features are extracted, a machine learning classifier is trained for binary classification (i.e., real or altered). The massive success of deep convolutional neural networks has significantly increased their usage in machine learning inspired solutions. However, it has been observed that deep learning algorithms are susceptible to intelligently crafted minute noises, and are popularly known as adversarial examples. The adversarial attacks can be both targeted and untargeted. The impact of these adversarial attacks can be seen in the physical world where the simple misclassification of stop sign’ to ‘increase speed’ can cause harm to pedestrian and the autonomous vehicle. Therefore, the detection of adversarial examples is essential for rightful and confident usage of deep learning-based solutions in the real world. As the final contribution, novel detection algorithms are developed to detect different kinds of adversarial attacks. The proposed solutions are the first in the community which can detect such vast and challenging scenarios, and yield the panoptic defense against adversarial examples being agnostic to the databases, adversarial attack algorithms, and CNN architectures.

2020-10-01T00:00:00Z Optimal inference algorithms for higher order MRF-MAP problems Ishant Arora, Chetan (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/835 2021-12-03T11:02:36Z 2020-08-13T00:00:00Z

Optimal inference algorithms for higher order MRF-MAP problems Ishant; Arora, Chetan (Advisor) Many computer vision problems can be formulated as finding the best labeling configuration. If labelings satisfy Markov property then finding best labeling configuration becomes MRF (Markov Random Field)- MAP (Maximum A Priori Posteriori) inference problem. Which is the minimization of cost of assigning labels to individual pixels and cost of assigning labelings to a collection of pixels (cliques). If we assume clique costs (or clique potentials) to be submodular then MRF-MAP inference becomes minimization of sum of submodular functions and can be done in polynomial time. Standard way to minimize a submodular function is by minimizing an equivalent dual objective defined on submodular polyhendron. As the first part of thesis in chapter 3, we develop an efficient inference algorithm for 2 label MRF-MAP problem named SoS-MNP. We show that the dual problem can be decomposed over cliques which enables the efficient optimization of dual in block co-ordinate descent (BCD) style. In our experiments we look at the image segmentation problem with clique size as large as 1000. We show that SoSMNP is very efficient and scalable to large cliques as compared to state of the art methods which scales only upto clique size of 16. In the second part of thesis in chapter 4, we develop the inference algorithm for 2-label MRF-MAP problem with a mix of small and large cliques. In such a configuration there are large number of small cliques which makes BCD style SoSMNP to be very slow. On the other hand there are other state of the art methods like Generic Cuts (GC) which run very fast for the problems with large number of small cliques but do not scale for the problems with large cliques. To overcome the limitations of both the algorithms we run GC for small cliques and SoS-MNP for large cliques in BCD style by proposing a mapping between the variables of SoS-MNP and GC. Even after this mapping the hybrid algorithm does not give optimal result because GC minimizes `1-norm and SOS-MNP minimizes `2-norm of the objective function. We propose a recursive method which calls GC multiple times to output `2-norm solution. In experiments we show that the quality of the pixelwise image segmentation results improve if we use both the small and large cliques as compared to if we use only large cliques. We also demonstrate the efficiency of the proposed hybrid method over SoSMNP on the configuration with small and large cliques. As the third and last part of thesis in chapter 5, we develop an inferencne algorithm for multi label MRF-MAP problems. The current state of the art methods only run for the configuration of cliques with size upto 4 and 4 labels only. Standard way to solve multi-label problem is by converting it into 2-label problem by some encoding, such encoding introduces many extra states. We show that there is enough structure in the submodular polyhendron of the encoded clique potentials which can be exploited. We propose an efficient hybrid method (Hybrid-ML) which avoids computation over the extra states. In our experiments we show that using an MRF with clique size 800 can improve the results obtained by state of the art deep learning network Segnet on pixel-wise multi-object segmentation results. We also run Hybrid-ML on a stereo correspondence problem with clique size 100 and 16 labels. We also compare the running time of Hybrid-ML with SoSMNP and show a huge improvement in terms of efficiency.

2020-08-13T00:00:00Z