Delving into Transferable Adversarial Examples and Black-box Attacks

Similar documents
Countering Adversarial Images using Input Transformations

Adversarial Examples in Deep Learning. Cho-Jui Hsieh Computer Science & Statistics UC Davis

EXPLORING AND ENHANCING THE TRANSFERABILITY

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

AI and Security: Lessons, Challenges & Future Directions

arxiv: v1 [cs.cv] 27 Dec 2017

(University Improving of Montreal) Generative Adversarial Networks with Denoising Feature Matching / 17

Imperial College London. Department of Electrical and Electronic Engineering. Final Year Project Report 2018

Spatial Localization and Detection. Lecture 8-1

Image Restoration with Deep Generative Models

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Lecture 19: Generative Adversarial Networks

AI AND CYBERSECURITY APPLICATIONS OF ARTIFICIAL INTELLIGENCE IN SECURITY UNDERSTANDING AND DEFENDING AGAINST ADVERSARIAL AI

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS

Bidirectional GAN. Adversarially Learned Inference (ICLR 2017) Adversarial Feature Learning (ICLR 2017)

arxiv: v1 [cs.cv] 22 Nov 2018

When Big Datasets are Not Enough: The need for visual virtual worlds.

Enhanced Attacks on Defensively Distilled Deep Neural Networks

arxiv: v2 [cs.cv] 6 Apr 2018

Adversarial Machine Learning An Introduction. With slides from: Binghui Wang

arxiv:submit/ [cs.cv] 13 Jan 2018

CSE 559A: Computer Vision

arxiv: v3 [cs.cv] 15 Sep 2018

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

arxiv: v3 [cs.cv] 28 Feb 2018

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Perception Deception: Physical Adversarial Attack Challenges and Tactics for DNN-based Object Detection

arxiv: v1 [cs.cv] 31 Mar 2016

Adversarial Examples and Adversarial Training. Ian Goodfellow, Staff Research Scientist, Google Brain CS 231n, Stanford University,

In Network and Distributed Systems Security Symposium (NDSS) 2018, San Diego, February Feature Squeezing:

Global Optimality in Neural Network Training

Structured Attention Networks

SPATIALLY TRANSFORMED ADVERSARIAL EXAMPLES

The State of Physical Attacks on Deep Learning Systems

Introduction to Generative Adversarial Networks

NeoNet: Object centric training for image recognition

Project report - COS 598B Physical adversarial examples for semantic image segmentation

Understanding Adversarial Space through the lens of Attribution

DPATCH: An Adversarial Patch Attack on Object Detectors

arxiv: v4 [stat.ml] 22 Jul 2018

Efficient Segmentation-Aided Text Detection For Intelligent Robots

ADVERSARIAL DEFENSE VIA DATA DEPENDENT AC-

arxiv: v1 [cs.cr] 1 Aug 2017

Introduction to Machine Learning

Cascade Adversarial Machine Learning Regularized with a Unified Embedding

Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms

Properties of adv 1 Adversarials of Adversarials

Dimensionality reduction as a defense against evasion attacks on machine learning classifiers

Logistic Regression and Gradient Ascent

Human Pose Estimation with Deep Learning. Wei Yang

arxiv: v2 [cs.cv] 15 Aug 2017

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation

C3 Numerical methods

Gradient of the lower bound

arxiv: v1 [stat.ml] 28 Jan 2019

A Deep Learning Approach to Vehicle Speed Estimation

1 Case study of SVM (Rob)

11. Neural Network Regularization

PPD: PERMUTATION PHASE DEFENSE AGAINST ADVERSARIAL EXAMPLES IN DEEP LEARNING

arxiv: v1 [cs.cv] 7 Sep 2018

POINT CLOUD DEEP LEARNING

arxiv: v1 [cs.cv] 16 Apr 2018

arxiv: v4 [cs.cr] 19 Mar 2017

Adversarial Attacks on Image Recognition*

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Inception Network Overview. David White CS793

arxiv: v1 [cs.lg] 28 Nov 2018

Lecture 5: Duality Theory

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

Team G-RMI: Google Research & Machine Intelligence

arxiv: v2 [cs.cr] 19 Feb 2016

arxiv: v1 [cs.cv] 16 Mar 2018

Energy Minimization for Segmentation in Computer Vision

arxiv: v1 [cs.lg] 25 Dec 2018

Characterizing adversarial examples in deep networks with convolutional filter statistics

Convolutional Neural Networks

Skin Lesion Classification and Segmentation for Imbalanced Classes using Deep Learning

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Fully Convolutional Networks for Semantic Segmentation

Learning Concept Taxonomies from Multi-modal Data

TorontoCity: Seeing the World with a Million Eyes

arxiv: v1 [cs.lg] 8 Apr 2019

Exploiting noisy web data for largescale visual recognition

Behavioral Data Mining. Lecture 10 Kernel methods and SVMs

Demystifying Deep Learning

Machine Learning Classifiers and Boosting

Deep Learning for Embedded Security Evaluation

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Deformable Segmentation using Sparse Shape Representation. Shaoting Zhang

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Adversarial Attacks and Defences Competition

An Orthogonal Curvilinear Terrain-Following Coordinate for Atmospheric Models!

Adversarial Patch. Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer

Generative Adversarial Networks (GANs) Ian Goodfellow, Research Scientist MLSLP Keynote, San Francisco

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

TOWARDS REVERSE-ENGINEERING BLACK-BOX NEURAL NETWORKS

Transcription:

Delving into Transferable Adversarial Examples and Black-box Attacks Yanpei Liu, Xinyun Chen 1 Chang Liu, Dawn Song 2 1 Shanghai JiaoTong University 2 University of the California, Berkeley ICLR 2017/ Presenter: Ji Gao Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 1 / 18

Outline 1 Motivation Transferability 2 Previous methods 3 Experiment I 4 Method 5 Experiment II 6 Experiment III: On realistic data Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 2 / 18

Motivation Adversarial examples: Samples are close to the normal seeds but are misclassified by the deep model Adversarial samples can transfer between models: Adversarial samples generated to fool one model can be also applied to another models. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 3 / 18

Motivation: Transferability Transferability of the adversarial samples: The ability for adversarial samples generated can be transferred to another model. Research problem: How to represent transferability? How can we generate samples with more? If we can generate transferable sample, it means that we can attack a black-box model easily by generating adversarial sample on any model. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 4 / 18

Targeted and non-targeted attack Targeted: Have an objective label. Non-targeted: Don t have an objective label, just want to mislead the model to a different label. Non-targeted samples have been shown easier to transfer. This paper focus on targeted attack. What if for the two models, the labels are different? Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 5 / 18

Adversarial samples Formally, adversarial sample can be defined as: Adversarial sample d(x, x ) B, x X F (x) F (x ) (1) Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 6 / 18

Previous solutions: Fast Gradient Sign The Fast Gradient Sign algorithm is one efficient algorithm for creating adversarial samples. Equation: x = ɛsign( x Loss(F (x ; θ), F (x; θ))) (2) This is motivated by controlling the L norm of x, i.e. perturbation in each feature dimension to be the same. If the sign function is not used, it become another method on L2 norm. It is called fast gradient method in this paper. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 7 / 18

Previous solutions: Optimization-based approach L2 attack (Carlini & Wagner (2016)) arg min λd(x, x ) + Loss(F (x; θ)) (3) x This approach is more intuitive and accurate, but costly. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 8 / 18

Targeted attack Previous method is for non-targeted attack, but it s easy to change to targeted attack. Simply use a different loss function: Targeted L2 attack (Carlini & Wagner (2016)) arg min λd(x, x ) + Loss y (F (x; θ)) (4) x Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) ExamplesICLR and Black-box 2017/ Presenter: AttacksJi Gao 9 / 18

Experiment I Experiment design: Models: 5 networks ResNet-50, ResNet-101, ResNet-152, GoogLeNet and VGG-16 Data: Imagenet 2012. Metric of transferability: Non-targeted: Accuracy Targeted: The number of samples predicted exactly the target class (on the transfered model), they called matching rate. Bound of distortion: RMSD (xi x i ) 2 /N i Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 10 / 18

Experiment of non-targeted adversarial samples Transfers well with a large distortion. FGS result is worse than FG and optimization-based approaches. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 11 / 18

Minimal transferable RMSD Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 12 / 18

Experiment of targeted adversarial samples Doesn t transfer well. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 13 / 18

Attack ensembles Generate adversarial images for the ensemble of the (five) models. Ensemble attack arg min λd(x, x ) + Loss( x i α i F i (x; θ)) (5) anpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 14 / 18

Experiment result of attacking ensembles For each of the five models, we treat it as the black-box model to attack, and generate adversarial images for the ensemble of the rest four, which is considered as white-box. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 15 / 18

Observations Several observations found: The gradient directions of different models in our evaluation are almost orthogonal to each other. Decision boundaries of the non-targeted approaches using different models align well. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 16 / 18

On realistic data Clarifai.com: A commercial company providing state-of-the-art image classification services Submit adversarial samples to the website. Result: Non-targeted: 57% of the targeted adversarial examples generated using VGG-16, and 76% of the ones generated using the ensemble can mislead Clarifai.com to predict labels irrelevent to the ground truth Targeted: 18% of those generated using the ensemble model can be predicted as labels close to the target label by Clarifai.com. The corresponding number for the targeted adversarial examples generated using VGG-16 is 2%. Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 17 / 18

Some adversarial samples Yanpei Liu, Xinyun Chen, Chang Liu, DawnDelving Song (Shanghai into Transferable JiaoTong Adversarial University) Examples ICLR and2017/ Black-box Presenter: Attacks Ji Gao 18 / 18