Automated Website Fingerprinting through Deep Learning

Similar documents
Circuit Fingerprinting Attack: Passive Deanonymization of Tor Hidden Services

DynaFlow: An Efficient Website Fingerprinting Defense Based on Dynamically-Adjusting Flows

arxiv: v3 [cs.cr] 19 Jul 2016

Using Packet Timing Information in Website Fingerprinting

Exploiting Data-Usage Statistics for Website Fingerprinting Attacks on Android

arxiv: v5 [cs.cr] 20 Aug 2018

Old Habits Die Hard: Fingerprinting Websites On The Cloud

k-fingerprinting: A Robust Scalable Website Fingerprinting Technique

Analyzing Tor s Anonymity with Machine Learning. Sanjit Bhat, David Lu May 21 st, 2017 Mentor: Albert Kwon

arxiv: v3 [cs.cr] 19 Feb 2016

Early detection of Crossfire attacks using deep learning

k-fingerprinting: a Robust Scalable Website Fingerprinting Technique

Avoiding The Man on the Wire: Improving Tor s Security with Trust-Aware Path Selection

Fishy Faces: Crafting Adversarial Images to Poison Face Authentication

Effective Attacks and Provable Defenses for Website Fingerprinting

Inside Job: Applying Traffic Analysis to Measure Tor from Within

On Realistically Attacking Tor with Website Fingerprinting

arxiv: v2 [cs.cr] 20 Sep 2017

An Active De-anonymizing Attack Against Tor Web Traffic

Se Eun Oh*, Shuai Li, and Nicholas Hopper Fingerprinting Keywords in Search Queries over Tor

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS

Beauty and the Burst

Introducing SOR: SSH-based Onion Routing

Adaptive Encrypted Traffic Fingerprinting With Bi-Directional Dependence

Machine Learning 13. week

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

LINKING TOR CIRCUITS

Tor Experimentation Tools

Measuring Information Leakage in Website Fingerprinting Attacks

Anonymity With Tor. The Onion Router. July 21, Technische Universität München

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Know your data - many types of networks

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

A SIMPLE INTRODUCTION TO TOR

Anonymous communications: Crowds and Tor

Learning to Recognize Faces in Realistic Conditions

Wei Wang, Mehul Motani and Vikram srinivasan Department of Electrical & Computer Engineering National University of Singapore, Singapore

Design and Analysis of Efficient Anonymous Communication Protocols

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

Feature Extraction and Learning for RSSI based Indoor Device Localization

anonymous routing and mix nets (Tor) Yongdae Kim

A study of classification algorithms using Rapidminer

Classifying Depositional Environments in Satellite Images

Adaptive Learning of an Accurate Skin-Color Model

One Fast Guard for Life (or 9 months)

Tutorial on Machine Learning Tools

Anonymity With Tor. The Onion Router. July 5, It s a series of tubes. Ted Stevens. Technische Universität München

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network

Learning Binary Code with Deep Learning to Detect Software Weakness

Opportunities and challenges in personalization of online hotel search

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

Experience with SPM in IPv6

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

AUTHENTICATION AND LOOKUP FOR NETWORK SERVICES

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Layerwise Interweaving Convolutional LSTM

Website Fingerprinting in Onion Routing Based Anonymization Networks

Report: Privacy-Preserving Classification on Deep Neural Network

0x1A Great Papers in Computer Security

Control of jitter buffer size using machine learning

CS5670: Computer Vision

ECS289: Scalable Machine Learning

A Deep Learning Framework for Authorship Classification of Paintings

Comparing Website Fingerprinting Attacks and Defenses

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

The New Cell-Counting-Based Against Anonymous Proxy

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Website Fingerprinting Defenses at the Application Layer

An Exploration of Computer Vision Techniques for Bird Species Classification

Can we overcome. FEARLESS engineering

Computer Based Image Algorithm For Wireless Sensor Networks To Prevent Hotspot Locating Attack

Machine Learning in WAN Research

Practical Anonymity for the Masses with MorphMix

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Network traffic classification: From theory to practice

Chaum, Untraceable Electronic Mail, Return Addresses, and Digital Pseudonym, Communications of the ACM, 24:2, Feb. 1981

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Anonymization of Network Traces Using Noise Addition Techniques

Character Recognition from Google Street View Images

Machine Learning in WAN Research

Prediction of Pedestrian Trajectories Final Report

A Deep Learning primer

THE SECOND GENERATION ONION ROUTER. Roger Dingledine Nick Mathewson Paul Syverson. -Presented by Arindam Paul

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Metrics for Security and Performance in Low-Latency Anonymity Systems

Hello Edge: Keyword Spotting on Microcontrollers

Alternatives to Direct Supervision

Low-Cost Traffic Analysis of Tor

CS526: Information security

Chapter 4. Advanced Internetworking. 4.3 MPLS 4.4 Mobile IP

CE Advanced Network Security Anonymity II

Facial Expression Classification with Random Filters Feature Extraction

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

NVIDIA DEEP LEARNING INSTITUTE

Moderated by: Moheeb Rajab Background singers: Jay and Fabian

ANONYMOUS CONNECTIONS AND ONION ROUTING

Leveraging USB to Establish Host Identity

Transcription:

Automated Website Fingerprinting through Deep Learning Vera Rimmer 1, Davy Preuveneers 1, Marc Juarez 2, Tom Van Goethem 1 and Wouter Joosen 1 NDSS 2018 Feb 19th (San Diego, USA) 1 2

Website Fingerprinting

Anonymous Communication through Tor All (secure) communication protocols expose metadata timing, size of packets, identities, locations, addresses, communication patterns > reveal private information Anonymity tools relay traffic through protected communication channels The Onion Router (Tor) 2

Website Fingerprinting Side-channel attack that reveals user s browsing activity Adversary is a local eavesdropper ISP Autonomous Systems Local network admins Wi-Fi hotspot owners 3

Website Fingerprinting Number of packets Average packet size % of incoming packets Timing of packets... 3

Website Fingerprinting 3

Website Fingerprinting "Closed world" of websites 3

Website Fingerprinting Pipeline Communication patterns 4

Website Fingerprinting Pipeline Feature Extraction Communication patterns 4

Website Fingerprinting Pipeline Feature Extraction Machine Learning Communication patterns 4

Website Fingerprinting Pipeline Feature Extraction Machine Learning Communication patterns Identification 4

State-of-the-Art Attacks knn (Wang et al., 2014) 3,000 features picked through heuristics (total size, total time, number of packets, packet ordering, traffic bursts ) Classifier: k-nearest Neighbors k-fingerprinting (Hayes et al., 2016) 150 features selected from Wang s through the analysis of feature importance Classifier: Random Forest and k-nearest Neighbors CUMUL (Panchenko et al., 2016) 100 features, interpolation points of the cumulative sum of packet lengths Classifier: Support Vector Machine 5

Website Fingerprinting Arms-race Feature Extraction Machine Learning Communication patterns Main focus of prior work: Manual engineering Intellectual effort Difficult and expensive AND Success of attacks is defined by the set of engineered features Identification 6

Website Fingerprinting Arms-race Feature Extraction Machine Learning Communication patterns Identification Concealing these features creates a countermeasure 7

Website Fingerprinting Arms-race Feature Extraction Machine Learning Communication patterns Feature Extraction New attack exploits other, still visible features Identification 8

Website Fingerprinting Arms-race Feature Extraction Machine Learning Communication patterns Feature Extraction Identification 9

Website Fingerprinting Feature Extraction Machine Learning Alternative? Communication patterns Identification 10

Website Fingerprinting Feature Extraction Machine Learning Communication patterns Identification Deep Learning

Deep Learning for WF

Why Deep Learning? Automatic feature learning from raw input Obviates hand-engineering of features Adaptive to changes in patterns Limited transparency and interpretability Learned features are implicit and abstract Efficient, easily distributed and parallelized 11

Deep Learning based WF Data Collection DL requires a lot of training data Deep Neural Network choice Choosing the best suited deep learning algorithm Hyperparameter Tuning and Model Selection Tuning of heavily parameterised models 12

Data Collection Built a distributed crawler captures timing, direction and sizes of TCP packets 2,500 traces for each 900 top Alexa most popular sites: largest-ever dataset Closed worlds: CWN datasets, where N is the number of sites 13

Deep Neural Networks Choice of a Deep Neural Network (DNN) suited for the input data 1D sequences of incoming and outgoing Tor cells encoded as 1 and -1 Explored 3 major types of DNNs: feedforward: Stacked Denoising Autoencoder (SDAE) learns from the continuous structure through dimensionality reduction convolutional: Convolutional Neural Network (CNN) learns from the spatial structure through convolutions and subsampling recurrent: Long Short Term Memory (LSTM) learns from the temporal structure (time-series) through internal memory 14

Evaluation and Results

Re-evaluation of Traditional Attacks 92.87 92.47 95.43 15

Re-evaluation of Traditional Attacks 92.87 92.47 95.43 Best performant on the closed world, most practical attack 15

Closed World 16

Closed World Overall, comparable with the state-of-the-art 16

Closed World CW100: CUMUL still outperforms all attacks, followed by CNN 16

Closed World Accuracy falls as the number of websites increases 16

Closed World CW900: SDAE outperforms stateof-the-art 16

Number of Traces per Website Accuracy (%) Number of instances per website 17

Number of Traces per Website Accuracy (%) LSTM takes longer to catch up (due to learning constraints on long sequences) Number of instances per website 17

Concept Drift CW200 18

Concept Drift CW200 Moment of training 18

Concept Drift CW200 SDAE, LSTM and CNN generalize better than the state-of-the-art Moment of training 18

Implications and Take-aways

Implications and Take-aways First thorough evaluation of DL for WF Powerful and robust attack (accuracy: 96% for CW100, 94% for CW900) Each DNN has its strengths and weaknesses Game-changer for the WF arms-race: Automated feature learning (vs. the burden of manual feature engineering) Harder to defend against (due to non-trivial interpretability of features) Data collection and model selection are crucial to the performance Evaluated by collecting the largest dataset for WF 19

Thank you! WEBSITE FINGERPRINTING THROUGH DEEP LEARNING https://distrinet.cs.kuleuven.be/software/tor-wf-dl

References 1. T. Wang and I. Goldberg, Improved Website Fingerprinting on Tor, in ACM Workshop on Privacy in the Electronic Society (WPES). ACM, 2013, pp. 201 212. 2. T. Wang and I. Goldberg, On realistically attacking tor with website fingerprinting, in Proceedings on Privacy Enhancing Technologies (PoPETs). De Gruyter Open, 2016, pp. 21 36. 3. T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Goldberg, Effective Attacks and Provable Defenses for Website Fingerprinting, in USENIX Security Symposium. USENIX Association, 2014, pp. 143 157. 4. A. Panchenko, F. Lanze, A. Zinnen, M. Henze, J. Pennekamp, K. Wehrle, and T. Engel, Website fingerprinting at internet scale, in Network & Distributed System Security Symposium (NDSS). IEEE Computer Society, 2016, pp. 1 15. 5. J. Hayes and G. Danezis, k-fingerprinting: a Robust Scalable Website Fingerprinting Technique, in USENIX Security Symposium. USENIX Association, 2016, pp. 1 17. 6. K. Abe and S. Goto, Fingerprinting attack on tor anonymity using deep learning, Proceedings of the Asia-Pacific Advanced Network, vol. 42, pp. 15 20, 2016. 7. M. Juarez, S. Afroz, G. Acar, C. Diaz, and R. Greenstadt, A critical evaluation of website fingerprinting attacks, in ACM Conference on Computer and Communications Security (CCS). ACM, 2014, pp. 263 274. 20

SDAE Tor cells Hidden representation Classification Feature extraction Autoencoder SDAE classifier 21

CNN Tor cells Feature extraction Feature extraction Classification Feature maps Convolution Subsampling CNN classifier 22

LSTM LSTM network LSTM unit 23

Closed World vs Open World Closed World Open World 4

State-of-the-Art Attacks knn (Wang et al., 2014) x 2 Features 3,000 (picked through heuristics) total size, total time, number of packets, packet ordering, traffic bursts Classifier k-nearest Neighbors (k-nn) Accuracy 92% (100 websites) k=7 k=4 x 1 6

State-of-the-Art Attacks k-fingerprinting (Hayes et al, 2016) Features 150 (selected from Wang s through analysis of feature importance) Classifier Random Forest + k-nn Accuracy 93% (100 websites) 7

State-of-the-Art Attacks CUMUL (Panchenko et al, 2016) Features 100 (derived as interpolation points of the cumulative sum of packet lengths) Classifier Support Vector Machine (SVM) Accuracy From 97% (100 websites) 8

Open World: ROC Curve Monitored: 200 websites Non-monitored: 400,000 websites 17

Open World: ROC Curve CNN and SDAE outperform state-of-the-art Monitored: 200 websites Non-monitored: 400,000 websites 17