M 2 R: Enabling Stronger Privacy in MapReduce Computa;on

Size: px

Start display at page:

Download "M 2 R: Enabling Stronger Privacy in MapReduce Computa;on"

Erika Sharp
6 years ago
Views:

1 M 2 R: Enabling Stronger Privacy in MapReduce Computa;on Anh Dinh, Prateek Saxena, Ee- Chien Chang, Beng Chin Ooi, Chunwang Zhang School of Compu,ng Na,onal University of Singapore

2 1. Mo;va;on Distributed computa,on (MapReduce) on large dataset with Trusted compu,ng. T T Storage T Integrity + Confiden,ality. Applicable in private or public cloud segng.

3 Challenge 1: Keep Trusted Code Base small Applica,on Frameworks Opera,ng Systems CVEs in Linux [CVE- DB] Hypervisor Affected many hypervisors (e.g Xen / KVM) [CS Report]

4 Baseline System Protected Code / Data TCB (e.g. SGX enclave) Opera,ng System Hypervisor MapReduce/Hadoop Framework S t o r a g e Trusted CPU All data outside trusted environment is encrypted SoZware only a[ack. Previous system: VC 3 (IEEE S&P 2015)

5 Background: MapReduce shuffling map reduce reduce map reduce map reduce Computa,on & shuffling of <key, value> tuples. Phases: Map à Shuffle à Reduce. map outputs a set of tuples. During Shuffling, tuples are grouped according to their key. Each reduce instance corresponds to an unique key k. It takes all tuples with the key k and output a set of tuples.

6 Background: Hadoop Hadoop: sozware framework wri[en in Java 190K LOC (Hadoop ) Consists of MapReduce modules, Hadoop Distributed File System (HDFS), etc. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

7 Is the Baseline System sufficient? T T Storage T Iden,fy small essen,al components of MapReduce to be included in the TCB.

8 Challenge 2: Interac;ons Leaks Info T T Storage T Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

9 Example of leakage: wordcount Map Phase: each generates the tuples. F1 w1 w2 shuffling w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 w1 w2 w4 (key: w3) F3 w3 w4 w3 w4 (key:w4)

10 Example of leakage: wordcount Shuffling Phase: The tuples are grouped w.r.t the words. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Reduce Phase: counts and outputs the number of tuples it received.

11 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) F3 contains a unique word

12 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.

13 Possible solu;on: Oblivious RAM Very high overhead. ORAM Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

14 2. Our solu;on Randomly permutes the tuples. Group the tuples according to their keys. Secure Mixing (MixT) Grouping

15 2. Our solu;on For execu,on integrity, addi,on step of verifica,on is required. Secure Mixing (MixT) Grouping Verify groupings (GroupT)

16 2. Our solu;on Randomly permutes the tuples Original untrusted Shuffling process Verifies grouping is correctly done Secure Mixing (MixT) Grouping Verify groupings (GroupT)

17 Cascaded Mixing A cascaded mixing is employed to randomly permute the tuples distributedly. mixt mixt mixt mixt mixt mixt mixt mixt mixt round 1 round 2 round 3

18 Remarks Key management, handling of the random nonce and ini,al value is not straighoorward. In Hadoop, mul,ple reduce instances are carried out by a single reducer. Likewise mapper. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

19 ORAM vs Our solu;on M 2 R exploits the fact that, reads and writes can be batched into 2 phases, whereas ORAM caters for single read/write and thus incurs higher overhead. Many construc,ons of ORAM need to permute or o- sort the data. ORAM Secure Mixing (MixT) Grouping Verify groupings (GroupT) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

20 3. Security Model Adversary can observe the following: Input/output size of each trusted instance. Source/des,na,on of the input/output. Time of invoca,on/return of each trusted instance. Ac,ve adversary can: Arbitrary Invoke trusted instances. Halt instances. Drop/duplicate ciphertext (encrypted tuples). Add delays. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

21 Modulo- private Based on formula,on by CaneG (FOCS 01). Let be the permissible data that can be revealed during honest execu,on. A provisioning protocol is modulo- private if, for any adversary A execuhng the protocol, there is an algorithm B with access only to, such that the output of A and B are indishnguishable. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

22 The permissible : size of input/output,,me of revoca,on/return of and under honest execu,on. shuffling F1 w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.

23 M 2 R is - modulo private. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

24 4. Implementa;ons & Experiments Use Xen as the trusted hypervisor, and its Verifiable Dynamic Func,on Executor to load and execute trusted codes. (The design of M 2 R can be implemented differently depending on the underlying architecture, e.g. on Intel SGX). Ported 7 MapReduc benchmark applica,ons. KMeans : Itera,ve, Compute intensive Grep : Compute intensive Pagerank : Itera,ve, Compute intensive WordCount : Shuffle intensive Index : Shuffle intensive Join : database queries Aggregate : database queries 8 compute nodes, each quad- core Intel CPU 1.8 GHz, 8GB RAM, 1GB Ethernet cards.

25 Trusted Code Base 4 trusted computa,on units: mixt, GroupT,,. Plaoorm related: (mixt, GroupT) Lines Of Code: 300 Applica,ons: (, ReduceT) Lines Of Code 200 for our examples. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

26 Performance Job Input size (bytes) (vs plaintext size) Shuffled bytes #Applica,ons hyper calls #plaoorm hyper calls Wordcount 2.1G (1.1 ) 4.2G Index 2.5G (1.2 ) 8G Grep 2.1G (1.1 ) 75M Aggregate 2.0G (1.2 ) 289M Join 2.0G (1.2 ) 450M Pagerank 2.5G (4.0 ) 2.6G KMeans 1.0G (1.1 ) 11K Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

27 Running ;me (s) Job Baseline (vs no encryp,on) M 2 R (vs baseline) Wordcount 570 (2.6 ) 1156 (2.0 ) Index 666 (1.6 ) 1549 (2.3 ) Grep 70 (1.5 ) 106 (1.5 ) Aggregate 125 (1.6 ) 205 (1.6 ) Join 422 (2 ) 510 (1.2 ) Pagerank 521 (1.6 ) 755 (1.4 ) KMeans 123 (1.7 ) 145 (1.2 ) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

28 Conclusions Privacy- preserving distributed computa,on of MapReduce with trusted compu,ng. Security: Execu,on integrity +Data Confiden,ality Observa,on that simply running the map/reduce in trusted environment is not sufficient: interac,ons leak sensi,ve info. Small TCB Exploit the algorithmic structure to outperform a solu,on that employs generic ORAM. Future works: other distributed dataflow systems.

M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION

1 M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION TIEN TUAN ANH DINH, PRATEEK SAXENA, EE-CHIEN CHANG, BENG CHIN OOI, AND CHUNWANG ZHANG, NATIONAL UNIVERSITY OF SINGAPORE PRESENTED BY: RAVEEN