Deep Learning for Embedded Security Evaluation

Similar documents
Non-Profiled Deep Learning-Based Side-Channel Attacks

Lowering the Bar: Deep Learning for Side Channel Analysis. Guilherme Perin, Baris Ege, Jasper van December 4, 2018

Lecture 37: ConvNets (Cont d) and Training

Perceptron: This is convolution!

Deep Learning for Computer Vision II

Deep Learning and Its Applications

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Deep Learning Cook Book

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

INTRODUCTION TO DEEP LEARNING

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Machine Learning 13. week

Safety verification for deep neural networks

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Dynamic Routing Between Capsules

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Convolutional Neural Networks and Supervised Learning

POINT CLOUD DEEP LEARNING

Deep Learning with Tensorflow AlexNet

Como funciona o Deep Learning

CSE 559A: Computer Vision

Structured Prediction using Convolutional Neural Networks

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Accelerating Convolutional Neural Nets. Yunming Zhang

Study of Residual Networks for Image Recognition

Practical Electromagnetic Template Attack on HMAC

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

EE-559 Deep learning Networks for semantic segmentation

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

Know your data - many types of networks

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep neural networks II

Spread: a new layer for profiled deep-learning side-channel attacks

Dissecting Leakage Resilient PRFs with Multivariate Localized EM Attacks

MoonRiver: Deep Neural Network in C++

Keras: Handwritten Digit Recognition using MNIST Dataset

6. Convolutional Neural Networks

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Fuzzy Set Theory in Computer Vision: Example 3

Big Data Era Time Domain Astronomy

CNN Basics. Chongruo Wu

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

CIS581: Computer Vision and Computational Photography Project 4, Part B: Convolutional Neural Networks (CNNs) Due: Dec.11, 2017 at 11:59 pm

Deep Learning for Computer Vision

RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

Deep Face Recognition. Nathan Sun

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

Multi-Glance Attention Models For Image Classification

Convolutional Neural Networks

Inception Network Overview. David White CS793

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Lecture 20: Neural Networks for NLP. Zubin Pahuja

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

On the Practical Security of a Leakage Resilient Masking Scheme

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Challenges motivating deep learning. Sargur N. Srihari

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Multi-Task Self-Supervised Visual Learning

Introduction to Neural Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Martian lava field, NASA, Wikipedia

Midterm Review. CS230 Fall 2018

CS231N Section. Video Understanding 6/1/2018

Spatial Localization and Detection. Lecture 8-1

An Introduction to NNs using Keras

A Deep Learning Approach to Vehicle Speed Estimation

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Harp-DAAL for High Performance Big Data Computing

Fully Convolutional Networks for Semantic Segmentation

CAP 6412 Advanced Computer Vision

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Fusion of Mini-Deep Nets

Computer Vision Lecture 16

Keras: Handwritten Digit Recognition using MNIST Dataset

Convolutional Neural Network Layer Reordering for Acceleration

1 Overview Definitions (read this section carefully) 2

Semantic Segmentation. Zhongang Qi

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Transcription:

Deep Learning for Embedded Security Evaluation Emmanuel Prouff 1 1 Laboratoire de Sécurité des Composants, ANSSI, France April 2018, CISCO April 2018, CISCO E. Prouff 1/22

Contents 1. Context and Motivation 1.1 Illustration 1.2 Template Attacks 1.3 Countermeasures 2. A machine learning approach to classification 2.1 Introduction 2.2 The Underlying Classification Problem 2.3 Convolutional Neural Networks 2.4 Training of Models 3. Building a Community Around The Subject 3.1 ASCAD Open Data-Base 3.2 Leakage Characterization and Training 3.3 Results 4. Conclusions April 2018, CISCO E. Prouff 2/22

Probability distribution function (pdf) of Electromagnetic Emanations Cryptographic Processing with a secret k = 1. April 2018, CISCO E. Prouff 3/22

Probability distribution function (pdf) of Electromagnetic Emanations Cryptographic Processing with a secret k = 1. April 2018, CISCO E. Prouff 3/22

Probability distribution function (pdf) of Electromagnetic Emanations Cryptographic Processing with a secret k = 2. April 2018, CISCO E. Prouff 3/22

Probability distribution function (pdf) of Electromagnetic Emanations Cryptographic Processing with a secret k = 3. April 2018, CISCO E. Prouff 3/22

Probability distribution function (pdf) of Electromagnetic Emanations Cryptographic Processing with a secret k = 4. April 2018, CISCO E. Prouff 3/22

Context: Target Device Clone Device April 2018, CISCO E. Prouff 4/22

Context: Target Device Clone Device [On Clone Device] For every k estimate the pdf of X K = k. k = 1 k = 2 k = 3 k = 4 April 2018, CISCO E. Prouff 4/22

Context: Target Device Clone Device [On Clone Device] For every k estimate the pdf of X K = k. k = 1 k = 2 k = 3 k = 4 April 2018, CISCO E. Prouff 4/22

Context: Target Device Clone Device [On Clone Device] For every k estimate the pdf of X K = k. k = 1 k = 2 k = 3 [On Target Device] Estimate the pdf of X. k = 4 k =? April 2018, CISCO E. Prouff 4/22

Context: Target Device Clone Device [On Clone Device] For every k estimate the pdf of X K = k. k = 1 k = 2 k = 3 [On Target Device] Estimate the pdf of X. k = 4 k =? [Key-recovery] Compare the pdf estimations. April 2018, CISCO E. Prouff 4/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Profiling phase (using profiling traces under known Z) Attack phase (N attack traces x i, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k N d k = log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Profiling phase (using profiling traces under known Z) estimate Pr[ X Z = z] by simple distributions for each value of z Attack phase (N attack traces x i, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k N d k = log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Profiling phase (using profiling traces under known Z) estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces x i, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k N d k = log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Profiling phase (using profiling traces under known Z) mandatory dimensionality reduction estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces x i, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k N d k = log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 5/22

Side Channel Attacks (Classical Approach) Notations X observation of the device behaviour (consumption, electro-magnetic emanation, timing, etc.) P public input of the processing Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Profiling phase (using profiling traces under known Z) manage de-synchronization problem mandatory dimensionality reduction estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces x i, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k N d k = log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 5/22

Defensive Mechanisms Misaligning Countermeasures Random Delays, Clock Jittering,... In theory: assume to be insufficient to provide security In practice: one of the main issues for evaluators = Need for efficient resynchronization techniques April 2018, CISCO E. Prouff 6/22

Defensive Mechanisms Misaligning Countermeasures Random Delays, Clock Jittering,... In theory: assume to be insufficient to provide security In practice: one of the main issues for evaluators = Need for efficient resynchronization techniques Masking Countermeasure Each key-dependent internal state element is randomly split into 2 shares The crypto algorithm is adapted to always manipulate shares at times The adversary needs to recover information on the two shares to recover K = Need for efficient Methods to recover tuple of leakage samples that jointly depend on the target secret April 2018, CISCO E. Prouff 6/22

Contents 1. Context and Motivation 1.1 Illustration 1.2 Template Attacks 1.3 Countermeasures 2. A machine learning approach to classification 2.1 Introduction 2.2 The Underlying Classification Problem 2.3 Convolutional Neural Networks 2.4 Training of Models 3. Building a Community Around The Subject 3.1 ASCAD Open Data-Base 3.2 Leakage Characterization and Training 3.3 Results 4. Conclusions April 2018, CISCO E. Prouff 7/22

Motivating Conclusions An evaluator can spend time to adjust realignment and/or to find points of interest... try realignments based over other kind of patterns modify parameters... April 2018, CISCO E. Prouff 8/22

Motivating Conclusions An evaluator can spend time to adjust realignment and/or to find points of interest... try realignments based over other kind of patterns modify parameters......but no prior knowledge about informative patterns no way to evaluate realignment without launching the afterwards attack April 2018, CISCO E. Prouff 8/22

Motivating Conclusions An evaluator can spend time to adjust realignment and/or to find points of interest... try realignments based over other kind of patterns modify parameters......but no prior knowledge about informative patterns no way to evaluate realignment without launching the afterwards attack Now: preprocessing to prepare data make strong hypotheses on the statistical dependency characterization to extract information The proposed perspective: preprocessing to prepare data make strong hypotheses on the statistical dependency Train algorithms to directly extract information April 2018, CISCO E. Prouff 8/22

Side Channel Attacks Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Profiling phase (using profiling traces under known Z) manage de-synchronization problem mandatory dimensionality reduction estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Side Channel Attacks with a Classifier Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Profiling phase (using profiling traces under known Z) manage de-synchronization problem mandatory dimensionality reduction estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Side Channel Attacks with a Classifier Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Training phase (using training traces under known Z) manage de-synchronization problem mandatory dimensionality reduction estimate Pr[ X Z = z] for each value of z Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Side Channel Attacks with a Classifier Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Training phase (using training traces under known Z) manage de-synchronization problem mandatory dimensionality reduction construct a classifier F ( x) = y Pr[Z X = x] Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log Pr[ X = x i Z = f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Side Channel Attacks with a Classifier Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Training phase (using training traces under known Z) manage de-synchronization problem mandatory dimensionality reduction construct a classifier F ( x) = y Pr[Z X = x] Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log F ( x i )[f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Side Channel Attacks with a Classifier Notations X side channel trace Z target (a cryptographic sensitive variable Z = f (P, K)) Goal: make inference over Z, observing X Pr[Z X] Template Attacks Machine Learning Side Channel Attacks Training phase (using training traces under known Z) } manage de-synchronization problem mandatory dimensionality reduction Integrated approach construct a classifier F ( x) = y Pr[Z X = x] Attack phase (N attack traces, e.g. with known plaintexts p i ) Log-likelihood score for each key hypothesis k d k = N log F ( x i )[f (p i, k)] i=1 April 2018, CISCO E. Prouff 9/22

Classification Classification problem Assign to a datum X (e.g. an image) a label Z among a set of possible labels Z = {Cat, Dog, Horse} Classifier April 2018, CISCO E. Prouff 10/22

Classification Classification problem Assign to a datum X (e.g. an image) a label Z among a set of possible labels Z = {Cat, Dog, Horse} Classifier Classification 0% 20% 40% 60% Horse Dog Cat April 2018, CISCO E. Prouff 10/22

Classification Classification problem Assign to a datum X (e.g. an image) a label Z among a set of possible labels Z = {Cat, Dog, Horse} Classifier Classification 0% 20% 40% 60% Horse Dog Cat Pr[Z X] April 2018, CISCO E. Prouff 10/22

Classification Classification problem Assign to a datum X (e.g. an image) a label Z among a set of possible labels Z = {Cat, Dog, Horse} Classifier Classification 0% 20% 40% 60% Horse Dog Cat Pr[Z X] SCA as a Classification Problem x Classifier P(Z X=x) 0% 50% 100% Z=1 Z=0 April 2018, CISCO E. Prouff 10/22

Machine Learning Approach Overview of Machine Learning Methodology Human effort: choose a class of algorithms choose a model to fit + tune hyper-parameters Automatic training: automatic tuning of trainable parameters to fit data aaa April 2018, CISCO E. Prouff 11/22

Machine Learning Approach Overview of Machine Learning Methodology Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa April 2018, CISCO E. Prouff 11/22

Machine Learning Approach Overview of Machine Learning Methodology Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters MLP, ConvNet Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa April 2018, CISCO E. Prouff 11/22

Machine Learning Approach Overview of Machine Learning Methodology Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters MLP, ConvNet Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa April 2018, CISCO E. Prouff 11/22

Convolutional Neural Networks An answer to translation-invariance Classifier Classification 0% 20% 40% 60% Horse Dog Cat April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance Classifier Classification 0% 20% 40% 60% Horse Dog Cat April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance Classifier Classification 0% 20% 40% 60% Horse Dog Cat April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance Classifier Classification 0% 20% 40% 60% Horse Dog Cat It is important to explicit the data translation-invariance April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance?? Classification? Classifier 0% 10% 20% 30% 40% 50% Horse Dog Cat It is important to explicit the data translation-invariance April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance x Classifier P(Z X=x) 0% 50% 100% Z=1 Z=0 It is important to explicit the data translation-invariance April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance x Classifier P(Z X=x) 0% 50% 100% Z=1 Z=0 It is important to explicit the data translation-invariance April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance x Classifier P(Z X=x) 0% 50% 100% Z=1 Z=0 It is important to explicit the data translation-invariance April 2018, CISCO E. Prouff 12/22

Convolutional Neural Networks An answer to translation-invariance x Classifier P(Z X=x) 0% 50% 100% Z=1 Z=0 It is important to explicit the data translation-invariance Convolutional Neural Networks: share weights across space April 2018, CISCO E. Prouff 12/22

Training of Neural Networks Trading Side-Channel Expertise for Deep Learning Expertise... or huge computational power! Training Aims at finding the parameters of the model (aka approximation) for the the statistical dependency between the target value and the leakage. The approximation is done by solving a minimization problem with respect to some metric (aka cost function) The training algorithm has itself some training hyper-parameters: the number of iterations (aka epochs) of the minimization procedure, the number of input traces (aka batch) treated during a single iteration. The architecture of the trained model has architecture hyper-parameters: the size of the layers, the nature of the layers, the number of layers, etc. Finding sound hyper-parameters is the main issue in Deep Learning: this can be done thanks to a good understanding of the underlying structure of the data and/or access to important computational power. April 2018, CISCO E. Prouff 13/22

ConvNet typical architecture 2nd Block of CONV layers n conv = 3 and n filters,2 = 8 3rd Block of CONV layers n conv = 3 and n filters,3 = 16 input trace l = 700 1st Block of CONV layers n conv = 3 and n filters,1 = 4 Max POOL (w = 2) Max POOL (w = 2) Block of FC layers + SOFTMAX n dense = 3 April 2018, CISCO E. Prouff 14/22

Contents 1. Context and Motivation 1.1 Illustration 1.2 Template Attacks 1.3 Countermeasures 2. A machine learning approach to classification 2.1 Introduction 2.2 The Underlying Classification Problem 2.3 Convolutional Neural Networks 2.4 Training of Models 3. Building a Community Around The Subject 3.1 ASCAD Open Data-Base 3.2 Leakage Characterization and Training 3.3 Results 4. Conclusions April 2018, CISCO E. Prouff 15/22

Creation of an open database for Training and Testing ANSSI recently publishes source codes of secure cryptographic implementations for public 8-bit architectures (https://github.com/anssi-fr/secaes-atmega8515) data-bases of electromagnetic leakages (https://github.com/anssi-fr/ascad) example scripts for the training and testing of Deep Learning models in the context of security evaluations Goal Enable easy fair and easy benchmarking Initiate discussions and exchanges on the application of DL to SCA Create a community of contributors on this subject April 2018, CISCO E. Prouff 16/22

Nature of the Observations/Traces Side-channel observations in ASCAD correspond to the masked processing of a simple cryptographic primitive Information leakage validated thanks to SNR characterization 1.0 0.8 snr2 snr3 0.6 0.4 0.2 0.0 45400 45500 45600 45700 45800 45900 46000 46100 Time Samples April 2018, CISCO E. Prouff 17/22

Nature of the Observations/Traces Side-channel observations in ASCAD correspond to the masked processing of a simple cryptographic primitive Information leakage validated thanks to SNR characterization 1.0 0.8 snr2 snr3 0.6 0.4 0.2 0.0 45400 45500 45600 45700 45800 45900 46000 46100 Time Samples Validate that shares are manipulated at different times April 2018, CISCO E. Prouff 17/22

Nature of the Observations/Traces Side-channel observations in ASCAD correspond to the masked processing of a simple cryptographic primitive Information leakage validated thanks to SNR characterization 1.0 0.8 snr2 snr3 0.6 0.4 0.2 0.0 45400 45500 45600 45700 45800 45900 46000 46100 Time Samples Validate that shares are manipulated at different times Scripts are also proposed to add artificial signal jittering April 2018, CISCO E. Prouff 17/22

Our Training Strategy Find a base model architecture and find training hyper-parameters for which a convergence towards the good key hypothesis is visible Fine-tune all the hyper-parameters one after another to get the best efficiency/effectiveness trade-off Table: Benchmarks Summary Parameter Reference Metric Range Choice Training Parameters Epochs - rank vs time 10, 25, 50, 60,..., 100, 150 up to 100 Batch Size - rank vs time 50, 100, 200 200 Architecture Parameters Blocks n blocks rank, accuracy [2..5] 5 CONV layers n conv rank, accuracy [0..3] 1 Filters n filters,1 rank vs time {2 i ; i [4..7]} 64 Kernel Size - rank {3, 6, 11} 11 FC Layers n dense rank, accuracy vs time [0..3] 2 ACT Function α rank ReLU, Sigmoid, Tanh ReLU Pooling Layer - rank Max, Average, Stride Average Padding - rank Same, Valid Same April 2018, CISCO E. Prouff 18/22

The Base Architecture Mean rank of the good-key hypothesis obtained with VGG-16, ResNet-50 and Inception-v3 w.r.t. different epochs: April 2018, CISCO E. Prouff 19/22

Comparisons with State-Of-the-Art Methods April 2018, CISCO E. Prouff 20/22

Contents 1. Context and Motivation 1.1 Illustration 1.2 Template Attacks 1.3 Countermeasures 2. A machine learning approach to classification 2.1 Introduction 2.2 The Underlying Classification Problem 2.3 Convolutional Neural Networks 2.4 Training of Models 3. Building a Community Around The Subject 3.1 ASCAD Open Data-Base 3.2 Leakage Characterization and Training 3.3 Results 4. Conclusions April 2018, CISCO E. Prouff 21/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization April 2018, CISCO E. Prouff 22/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) April 2018, CISCO E. Prouff 22/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach April 2018, CISCO E. Prouff 22/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient April 2018, CISCO E. Prouff 22/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient New needs: big data-bases for the training, platforms to enable comparisons and benchmarking, create an open community ML for Embedded Security Analysis, encourage exchanges with the Machine Learning community, understand the efficiency of the current countermeasures April 2018, CISCO E. Prouff 22/22

Conclusions State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient New needs: big data-bases for the training, platforms to enable comparisons and benchmarking, create an open community ML for Embedded Security Analysis, encourage exchanges with the Machine Learning community, understand the efficiency of the current countermeasures Thank You! Questions? April 2018, CISCO E. Prouff 22/22