M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION

Similar documents
M 2 R: Enabling Stronger Privacy in MapReduce Computation

M 2 R: Enabling Stronger Privacy in MapReduce Computa;on

Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute

Secure Multiparty Computation

Developing MapReduce Programs

CS573 Data Privacy and Security. Cryptographic Primitives and Secure Multiparty Computation. Li Xiong

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Secure Multiparty Computation: Introduction. Ran Cohen (Tel Aviv University)

Distributed ID-based Signature Using Tamper-Resistant Module

MapReduce Design Patterns

1 A Tale of Two Lovers

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Computer Security CS 526

CSC 5930/9010 Cloud S & P: Cloud Primitives

MapReduce Algorithms

Delegated Access for Hadoop Clusters in the Cloud

Secure Remote Storage Using Oblivious RAM

Network Security Technology Project

CLUSTERING is one major task of exploratory data. Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset

Secure Multi-Party Computation. Lecture 13

Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM)

0x1A Great Papers in Computer Security

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced Allocations

Encrypted Data Deduplication in Cloud Storage

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Bitcoin, Security for Cloud & Big Data

Secure Multiparty Computation

Introduction to Secure Multi-Party Computation

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Mitigating Data Skew Using Map Reduce Application

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Other Topics in Cryptography. Truong Tuan Anh

S. Indirakumari, A. Thilagavathy

Data Partitioning and MapReduce

Practical Secure Two-Party Computation and Applications

Lecture 1 Applied Cryptography (Part 1)

Lectures 6+7: Zero-Leakage Solutions

Crypto Background & Concepts SGX Software Attestation

Foundations of Cryptography CS Shweta Agrawal

Computer Security. 08r. Pre-exam 2 Last-minute Review Cryptography. Paul Krzyzanowski. Rutgers University. Spring 2018

Block ciphers, stream ciphers

Homework 2. Out: 09/23/16 Due: 09/30/16 11:59pm UNIVERSITY OF MARYLAND DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

On The Security Of Querying Encrypted Data

CRYPTOGRAPHIC PROTOCOLS: PRACTICAL REVOCATION AND KEY ROTATION

Structured Encryption and Controlled Disclosure

ENEE 459-C Computer Security. Security protocols (continued)

Midgame Attacks. (and their consequences) Donghoon Chang 1 and Moti Yung 2. IIIT-Delhi, India. Google Inc. & Columbia U., USA

Secure Data Deduplication with Dynamic Ownership Management in Cloud Storage

2 Secure Communication in Private Key Setting

Protocols for Anonymous Communication

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Accelerate Big Data Insights

Secure Data De-Duplication With Dynamic Ownership Management In Cloud Storage

Efficient Private Information Retrieval

Symmetric Cryptography

Design and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

The Salsa20 Family of Stream Ciphers

Cryptography. Andreas Hülsing. 6 September 2016

Defining Encryption. Lecture 2. Simulation & Indistinguishability

1 Achieving IND-CPA security

1. Stratified sampling is advantageous when sampling each stratum independently.

Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms. EJ Jung

An Overview of Secure Multiparty Computation

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

RSA. Public Key CryptoSystem

Cryptography CS 555. Topic 11: Encryption Modes and CCA Security. CS555 Spring 2012/Topic 11 1

Introduction to MapReduce

Lecture 18 - Chosen Ciphertext Security

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang

Security And Privacy Challenges In Big Data

EECS 498 Introduction to Distributed Systems

Review on Managing RDF Graph Using MapReduce

Lecture 3.4: Public Key Cryptography IV

CSC 774 Advanced Network Security

ENEE 459-C Computer Security. Security protocols

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay

SECURE MULTI-KEYWORD TOP KEY RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD DATA

Parallel Coin-Tossing and Constant-Round Secure Two-Party Computation

Security of Identity Based Encryption - A Different Perspective

Lecturers: Mark D. Ryan and David Galindo. Cryptography Slide: 24

Algorithms for Grid Graphs in the MapReduce Model

Resilient Distributed Datasets

MTAT Research Seminar in Cryptography IND-CCA2 secure cryptosystems

R-Store: A Scalable Distributed System for Supporting Real-time Analytics

SGX Security Background. Masab Ahmad Department of Electrical and Computer Engineering University of Connecticut

Securely Outsourcing Garbled Circuit Evaluation

Introduction to Secure Multi-Party Computation

HadoopDB: An open source hybrid of MapReduce

Introduction to MapReduce

Hiding in the Cloud: The Perils and Promise of Searchable Encryption

6.857 L17. Secure Processors. Srini Devadas

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Cryptography Math/CprE/InfAs 533

TI2736-B Big Data Processing. Claudia Hauff

Mapping Internet Sensors with Probe Response Attacks

Key Protection for Endpoint, Cloud and Data Center

Transcription:

1 M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION TIEN TUAN ANH DINH, PRATEEK SAXENA, EE-CHIEN CHANG, BENG CHIN OOI, AND CHUNWANG ZHANG, NATIONAL UNIVERSITY OF SINGAPORE PRESENTED BY: RAVEEN WIJEWICKRAMA DATE: 03/02/2017

2 OUTLINE Introduction Proposed System Performance Evaluation Conclusion

3 INTRODUCTION Privacy of data in cloud Encryption of data Computations on encrypted data: challenging Privacy-preserving computation trusted computing support for hardware-isolated computation purely cryptographic techniques

4 PROBLEM STATEMENT Enabling privacy-preserving distributed computation in an un-trusted cloud. Distributed computing model in which components of a software system are shared among multiple computers to improve efficiency and performance. Focus: privacy in the popular MapReduce framework

5 CONTRIBUTIONS Privacy-preserving distributed computation Defines a new level of privacy for MapReduce No algorithmic restructuring needed Attacks Mere encryption of data is insufficient Simple Design Non-intrusive Architectural change to MapReduce

6 BACKGROUND

7 MapReduce A software framework Enables distributed processing of massive data sets across many servers. E.g. Apache Hadoop Units of computation: Map & Reduce Map: Takes a set of data and convert it to another form. Output: keyvalue tuples Reduce: takes input from Map. Combines data tuples into a smaller set of tuples.

8 MapReduce (blog.sqlauthority.com, 2013)

9 THREAT MODEL Passive/honest-but curious attacker Tries to learn sensitive info without deviating from the protocol Active/malicious attacker Deviates from protocol and tamper with any data under its control Direct Attacks Observing data passing bet. computation nodes Sabotaging computation of each map/reduce instance

10 BASELINE SYSTEM Computation units Hardware-isolated Executed privately Side channels are protected Intermediate data (bet. Computation units) Encrypted (with authenticated encryption) Decrypted only in a trusted environment Can only invoke with its complete input set.

11 PROBLEM DEFINITION Ideally only input/output size, execution time should be leaked Baseline System Input, output size, processing time of computation units Data flow among the computation units (MAP REDUCE)

12 PROBLEM DEFINITION Allowable leakage: Ψ Input, output size, processing time of computation units Privacy modulo-psi

13 DEFINITION OF SECURITY PROTOCOL A provisioning protocol for a program is modulo-ψ private if, for any adversary A executing the MapReduce protocol, there is an adversary ഥA with access only to Ψ, such that the output of A and ഥA are indistinguishable.

14 ASSUMPTIONS Underlying hardware sufficiently protects each computation unit from malware and snooping attacks. Information leakage via side-channels from a computation unit is minimal. To enable arbitrary computation on encrypted data, decryption keys need to be made available to each hardware-isolated computation unit.

15 WORD-COUNT JOB IN MAP-REDUCE ("Map Reduce (MR) Framework [Gerardnico]", 2017)

16 ATTACKS Data flow patterns Order of execution Time of access ("Map Reduce (MR) Framework [Gerardnico]", 2017)

17 ATTACKS Passive Attacks Data flow patterns Order of execution Time of access Active Attacks Tuple tampering The adversary may attempt to duplicate or eliminate an entire output tuple-set produced by a computation unit. Misrouting tuples The adversary can reorder intermediate tuples or route data blocks intended for one reduce unit to another.

18 GENERIC SOLUTION: OBLIVIOUS RAM Computationally expensive O(log k N) for each access when no. of tuples=n With sorting algorithm (used for grouping) O(log k+1 N) 30 to 100 times slow

19 PROPOSED SYSTEM: M 2 R Authors observation Fixed sequence of data access patterns Cycles of tuple writes followed by reads. Authors solution re-write intermediate encrypted tuples with re-randomized tuple keys Such that there is no linkability between the re-randomized tuples and the original encrypted map output tuples.

20 PROPOSED SYSTEM: M 2 R Authors solution Use a secure mix-network (a cascaded mix-network) To securely shuffle tuples Procedure Cascading k-mixing steps mixt units (trusted computation units) Execution distributed over multiple nodes: mixers

21 PROPOSED SYSTEM: M 2 R Mapper Mixer Mixer Reducer MixT MixT Mapper MixT MixT Reducer

22 PROPOSED SYSTEM: M 2 R Each mixt takes a fixed amount of T tuples In each step of the cascade, the mixer utilizes N/T mixt units for mixing (total no. ) N tuples. At K =log N/T the strongest possible unlinkability MixT unit Decrypts the tuples received Randomly permutes them (O(n) algorithm) Re-encrypts the permuted tuples with new randomly chosen symmetric key.

23 PROPOSED SYSTEM: M 2 R Secure Grouping Original grouping algorithm can be used. Takes the output of the cascade-mix, group the tuples with the same-key & forwards to the reducers. Extra step in cascade Hence, Key-component of the output tuples are encrypted using a deterministic symmetric encryption algorithm Fs (with a secret key s). {Fs(a), E(a,b)} where E(.): a probabilistic encryption scheme two shuffled tuples with the same tuple-keys will have the same ciphertext for the keycomponent So grouping algorithm can group them without decrypting the tuples.

24 PROPOSED SYSTEM: M 2 R

25 PROPOSED SYSTEM: M 2 R Integrity Check groupt Checks to ensure that ordering of grouped tuples passed to reducet is preserved. Checks for duplicated tuples Checks to ensure that no tuples are dropped.

26 PROPOSED SYSTEM: M 2 R Achieves private modulo

27 IMPLEMENTATION A modified standard Hadoop implementation to invoke mixt & groupt units 90 LoC to TCB mapt & reducet: same as baseline system Execution Client encrypts & uploads data to M 2 R nodes User submits M 2 R applications & finally decrypts the results.

28 PERFORMANCE EVALUATION Benchmarks HiBench suite: a standard benchmark for Hadoop Setup Bet. 1 to 4 compute nodes (mappers & reducers) Bet. 1 to 4 mixer nodes (for mix-network) Results shown: From running 4 compute nodes & 4 mixers averaged over 10 executions.

29 PERFORMANCE EVALUATION Summary of the porting effort and TCB increase for various M2R applications All jobs have TCB increases of fewer than 500 LoC, merely 0.16% of the Hadoop codebase.

30 PERFORMANCE EVALUATION Overall running time (s) of M2R applications in comparison with other systems Job Baseline (vs. no M 2 R (% increase vs. Download-andcomplete encryption baseline) (x M 2 R) Wordcount 570 (221) 1156 (100%) 1859 (1.6x) Index 666 (423) 1549 (130%) 2061 (1.3x) Grep 70 (48) 106 (50%) 1686 (15.9x) Aggregate 125 (80) 205 (64%) 9140 (44.6x) Join 422 (211) 510 (20%) 5716 (11.2x) Pagerank 521 (334) 755 (44%) 1209 (1.6x) KMeans 123 (71) 145 (17%) 6071 (41.9x)

31 PERFORMANCE EVALUATION Normalized break-down time for M2R applications. Running time: time taken by mapt & reducet + time taken by the secure shuffler. The rest comes from the Hadoop runtime.

32 PERFORMANCE EVALUATION Remaining leakage of M 2 R applications, compared with that in the baseline system. Job M2R Baseline (additional leakage) Wordcount # unique words + count word-file relationship Index # unique words + count word-file relationship Grep nothing nothing Aggregate # groups + group size record-group relationship Join # groups + group size record-group relationship Pagerank node in-degree whole input graph KMeans nothing nothing

33 CONCLUSION The proposed design demonstrates a newly defined level of security lower performance overhead Requires only a small TCB. M 2 R shows that the System requires only a little effort to port legacy MapReduce applications Scalable

34 REFERENCES Big Data - Buzz Words: What is MapReduce - Day 7 of 21 - Journey to SQL Authority with Pinal Dave. (2017). Journey to SQL Authority with Pinal Dave. Retrieved 2 March 2017, from https://blog.sqlauthority.com/2013/10/09/big-data-buzz-words-what-is-mapreduceday-7-of-21/ Cloud Computing Png 81020 UPSTORE. (2017). Upstore.me. Retrieved 2 March 2017, from http://www.upstore.me/cloudcomputing-png.html Dinh, T. T. A., Saxena, P., Chang, E., Ooi, B. C., & Zhang, C. (2015). M 2 R: enabling stronger privacy in mapreduce computation. Proceedings of the 24th USENIX Conference on Security Symposium (SEC), 447 462. Retrieved from http://dl.acm.org/citation.cfm?id=2831143.2831172 Map Reduce (MR) Framework [Gerardnico]. (2017). Gerardnico.com. Retrieved 2 March 2017, from http://gerardnico.com/wiki/algorithm/map_reduce MapReduce Introduction. (2017). www.tutorialspoint.com. Retrieved 2 March 2017, from https://www.tutorialspoint.com/map_reduce/map_reduce_introduction.htm Secure Content Technologies» Palo Alto. (2017). Securecontenttechnologies.com. Retrieved 2 March 2017, from http://www.securecontenttechnologies.com/paloalto.html

35 QUESTIONS?

36 THANK YOU!