Trusted AI

Tandon, Madhur; Garg, Mudit; Vatsa, Mayank (Advisor); Singh, Richa (Advisor)

dc.contributor.author	Tandon, Madhur
dc.contributor.author	Garg, Mudit
dc.contributor.author	Vatsa, Mayank (Advisor)
dc.contributor.author	Singh, Richa (Advisor)
dc.date.accessioned	2021-05-21T10:28:26Z
dc.date.available	2021-05-21T10:28:26Z
dc.date.issued	2020-05-22
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/902
dc.description.abstract	There are various domains under which Machine Learning has been successfully applied. With the availability of high computing resources, tasks such as image classification, object detection and others are easily tackled by learning algorithms. However, The field of deep learning, and neural networks in particular are vulnerable to adversarial samples. These adversarial attacks are able to severely affect the performance of a deep learning system. The adversarial samples are usually very similar to the original samples and thus are indistinguishable to humans, however they are able to fool the deep learning systems leading to an incorrect classification. In this report, we comprehensively summarize the recent progress in the field of adversarial machine learning. Specifically, we give an overview of various attacking methods that currently are state of the art while also reviewing various defense mechanisms. We also conduct our own experiments that show how state of the art models such as ResNet are also prone to adversarial attacks. Further, We also propose an end-to-end model that could successfully mitigate the adversarial attacks on architectures that specifically solve for object classification tasks. We call this model as the ”Mitigator” since it helps an already trained classifier in mitigating the attack. The ”Mitigator” has the following properties: • The Mitigator should be invariant from the complexity of the attack • Robust from any kind of attacks that change the pixel values of an Image • Can be attached in front of any classifier that takes an image as its input We further experiment with mixing an ensemble technique with the above approach. This leads us to train many more models (9 of them in total). All of these 9 models are used in succession and voting mechanisms are used for final classification. The testing is done against 2 popular attacks: Deepfool and Carlini L2. In the end, we showcase the improvement in results and discuss the Pros and Cons of this approach	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Artificial Intelligence, Deep Learning, Neural Networks, Adversarial Attacks, Defense Methods, Security, Robustness	en_US
dc.title	Trusted AI	en_US
dc.type	Other	en_US