Countering Adversarial Images using Input Transformations

Size: px

Start display at page:

Download "Countering Adversarial Images using Input Transformations"

Corey Parks
5 years ago
Views:

1 Countering Adversarial Images using Input Transformations Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens Van Der Maaten Presented by Hari Venugopalan, Zainul Abi Din

2 Motivation: Why is this a hard problem to solve? Adversarial Examples are very easy to generate and very difficult to defend against - problem with neural nets, piecewise linear functions Adversarial Examples transfer from one model to another very easily This problem is not well understood, researchers are divided on the nature of these samples.

3 Problem Definition: Adversarial Example x is the original input, x is the perturbed input which is found by adding some noise to x x Given a classifier h(x), the score h(x) and h(x ) should not be the same. While d(x, x ) is smaller than some threshold. + Noise Keep the distortion low and move the decision boundary x Source: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

4 Measure of distortion Normalized L2-Dissimilarity X is the actual image X is the adversarial image N is the total number of images Low L2-Dissimilarity Source: Countering Adversarial Images using Input Transformations, Chuan Guo

5 Intuitively? What is an attack X: Original Input h(x) Y: probabilities Classifier X : Perturbed Image h(x ) Y: probabilities Classifier Image taken from: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

Types of attacks in DNNs White-Box: The attacker is assumed to have complete knowledge of the networks weights and architecture Black-Box: Does not require internal model access Gray-Box: Attacker

6 Types of attacks in DNNs White-Box: The attacker is assumed to have complete knowledge of the networks weights and architecture Black-Box: Does not require internal model access Gray-Box: Attacker doesn t know the defense strategy used. Targeted Attack: Attacker selects the class they want the example to be misclassified as. Untargeted Attack: Any misclassification is the goal Source: ZOO: Zeroth Order Optimization Based Black-box A acks to Deep Neural Networks without Training Substitute Models, Pi-Yu Chen

7 How do you find this noise? Fast Gradient Sign Method Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy x = x + η 2. η = sign ( xj(θ, x, y)).

8 Deep fool: f(x) is the classifier Given a sample x 0, change f(x 0 ) sign Project x 0 orthogonally on f(x) r = {f( x 0 ) / [ x f( x 0 ) ^2 } x f( x 0 ) Keep adding r to x 0 till sign of f( x 0 ) changes Source: DeepFool: A simple and accurate method to fool deep neural networks, Moosavi Dezfooli

9 Adversarial Transformative Networks f w (X) X Target NN Probabilities Learn the noise g f,θ (X) ATN X X + Learn to generate and adversarial sample for Target NN Image Source: DeepFool: A simple and accurate method to fool deep neural networks, Moosavi Dezfooli Concept Source: Learning to generate adversarial examples, Shumeet Baluja

10 ATN Cost- Gradient W.R.T θ Cost = ΣβL x (g f,θ (x), x) + L y (f w (g f,θ (x)), r(f w (x), t)) r(f(x),t) = norm(ɑ*max(f(x); k = t, f(x); otherwise) Source: Learning to generate adversarial examples, Shumeet Baluja

11 Carlini and Wagner Attack: min x [ x - x 2 + λ f max(-k, Z(x ) h(x) - max{z(x ) k :k!= h(x)})] Z(x ): Input to the softmax layer Z(x ) k : k th component of Z(x ) max{z(x ) k :k!= h(x)} : second largest logit Z(x ) h(x) - max{z(x ) k :k!= h(x)} : Difference between second largest logit and largest logit k : Confidence Source: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

12 Countering Adversarial Images Improve robustness of model Exploit randomness and non-differentiability Model specific and model agnostic defenses

13 Model Specific Defenses Based on Robust Optimization Minimization-maximization approach is followed Learning algorithm and regularization scheme Makes assumptions on nature of adversary Do not satisfy Kerckhoff's principle

14 Model Agnostic Defenses Transforms images to remove perturbations JPEG compression and image re-scaling are examples Paper aims to increase effectiveness of model agnostic defenses

15 Feature Squeezing Proposed by Xu et al Detects adversarial inputs Input space is reduced by squeezing out features Outputs are compared in original and reduced spaces Different outputs means adversarial inputs Source: Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

16 JPEG Compression Proposed by Dziugaite et al Removes perturbations by compressing images Compression could remove aspects of perturbation Effective against small-magnitude perturbation With larger perturbation, compression is unable to recover non-adversarial image Source: A study of the effect of JPG compression on adversarial images

17 Total Variation Minimization Transformation is taken as an optimization problem Transformation is close to input, but also has low variation Inspired by noise removal Source: Countering Adversarial Images using Input Transformations

18 Image Quilting Images pieced together from small patches taken from database Non-parametric technique Database only has clean patches Patches are placed over predefined points and edges are smoothed Patches selected using KNN on database Resulting image does not have any perturbations

19 Experiments Performed in black box and gray box setting Performed on Imagenet dataset ResNet-50 model was attacked Attacks included FGSM, I-FGSM, Deepfool and Carlini and Wagner Top1 classification accuracy was reported for varying normalized L2 dissimilarities

20 Gray Box: Image Transformations at Test Time Source: Countering Adversarial Images using Input Transformations

21 Black Box: Image transformations at test time Source: Countering Adversarial Images using Input Transformations

22 Black Box: Ensembling and model transfer Source: Countering Adversarial Images using Input Transformations

23 Gray Box: Image transformations at test time Source: Countering Adversarial Images using Input Transformations

24 Comparison to Ensemble Adversarial Training Source: Countering Adversarial Images using Input Transformations

25 Our experiment: Running TVM with an ATN We built an ATN that broke an ANN from error rate 0.34 to 0.91 Error rate went from 0.91 to 0.90 when ATN perturbation was transformed using TVM Dataset consisted of non-mnist images

26 Conclusions and Questions Image transformations are a more generic defense Benefits from randomization Benefits from non-differentiability Why were they not successful against ATN attack? Like ATNs, can the best transformation be learned? How good are these transformations in complete white box settings?

27 Merci Beaucoup

Delving into Transferable Adversarial Examples and Black-box Attacks

Delving into Transferable Adversarial Examples and Black-box Attacks Yanpei Liu, Xinyun Chen 1 Chang Liu, Dawn Song 2 1 Shanghai JiaoTong University 2 University of the California, Berkeley ICLR 2017/