Abstract:
Deep neural networks (DNNs) have been shown to be
vulnerable to adversarial examples - malicious inputs which
are crafted by the adversary to induce the trained model
to produce erroneous outputs. This vulnerability has inspired
a lot of research on how to secure neural networks
against these kinds of attacks. Although existing techniques
increase the robustness of the models against white-box attacks,
they are ineffective against black-box attacks. To
address the challenge of black-box adversarial attacks, we
propose Adversarial Model Cascades (AMC); a framework
that performs better than existing state-of-the-art defenses,
in both black-box and white-box settings and is easy to integrate
into existing set-ups. Our approach trains a cascade
of models by injecting images crafted from an already defended
proxy model, to improve the robustness of the target
models against adversarial attacks. To the best of our
knowledge, ours is the first work that provides a defense
mechanism that can improve robustness against multiple
adversarial attacks simultaneously. AMC provides an increase
in robustness of 8:175% & 7:115% for white-box attacks
and 30:218% & 4:717% for black-box, in comparison
to defensive distillation and adversarial hardening