M 2 R: Enabling Stronger Privacy in MapReduce Computa;on

Size: px
Start display at page:

Download "M 2 R: Enabling Stronger Privacy in MapReduce Computa;on"

Transcription

1 M 2 R: Enabling Stronger Privacy in MapReduce Computa;on Anh Dinh, Prateek Saxena, Ee- Chien Chang, Beng Chin Ooi, Chunwang Zhang School of Compu,ng Na,onal University of Singapore

2 1. Mo;va;on Distributed computa,on (MapReduce) on large dataset with Trusted compu,ng. T T Storage T Integrity + Confiden,ality. Applicable in private or public cloud segng.

3 Challenge 1: Keep Trusted Code Base small Applica,on Frameworks Opera,ng Systems CVEs in Linux [CVE- DB] Hypervisor Affected many hypervisors (e.g Xen / KVM) [CS Report]

4 Baseline System Protected Code / Data TCB (e.g. SGX enclave) Opera,ng System Hypervisor MapReduce/Hadoop Framework S t o r a g e Trusted CPU All data outside trusted environment is encrypted SoZware only a[ack. Previous system: VC 3 (IEEE S&P 2015)

5 Background: MapReduce shuffling map reduce reduce map reduce map reduce Computa,on & shuffling of <key, value> tuples. Phases: Map à Shuffle à Reduce. map outputs a set of tuples. During Shuffling, tuples are grouped according to their key. Each reduce instance corresponds to an unique key k. It takes all tuples with the key k and output a set of tuples.

6 Background: Hadoop Hadoop: sozware framework wri[en in Java 190K LOC (Hadoop ) Consists of MapReduce modules, Hadoop Distributed File System (HDFS), etc. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

7 Is the Baseline System sufficient? T T Storage T Iden,fy small essen,al components of MapReduce to be included in the TCB.

8 Challenge 2: Interac;ons Leaks Info T T Storage T Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

9 Example of leakage: wordcount Map Phase: each generates the tuples. F1 w1 w2 shuffling w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 w1 w2 w4 (key: w3) F3 w3 w4 w3 w4 (key:w4)

10 Example of leakage: wordcount Shuffling Phase: The tuples are grouped w.r.t the words. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Reduce Phase: counts and outputs the number of tuples it received.

11 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) F3 contains a unique word

12 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.

13 Possible solu;on: Oblivious RAM Very high overhead. ORAM Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

14 2. Our solu;on Randomly permutes the tuples. Group the tuples according to their keys. Secure Mixing (MixT) Grouping

15 2. Our solu;on For execu,on integrity, addi,on step of verifica,on is required. Secure Mixing (MixT) Grouping Verify groupings (GroupT)

16 2. Our solu;on Randomly permutes the tuples Original untrusted Shuffling process Verifies grouping is correctly done Secure Mixing (MixT) Grouping Verify groupings (GroupT)

17 Cascaded Mixing A cascaded mixing is employed to randomly permute the tuples distributedly. mixt mixt mixt mixt mixt mixt mixt mixt mixt round 1 round 2 round 3

18 Remarks Key management, handling of the random nonce and ini,al value is not straighoorward. In Hadoop, mul,ple reduce instances are carried out by a single reducer. Likewise mapper. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

19 ORAM vs Our solu;on M 2 R exploits the fact that, reads and writes can be batched into 2 phases, whereas ORAM caters for single read/write and thus incurs higher overhead. Many construc,ons of ORAM need to permute or o- sort the data. ORAM Secure Mixing (MixT) Grouping Verify groupings (GroupT) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

20 3. Security Model Adversary can observe the following: Input/output size of each trusted instance. Source/des,na,on of the input/output. Time of invoca,on/return of each trusted instance. Ac,ve adversary can: Arbitrary Invoke trusted instances. Halt instances. Drop/duplicate ciphertext (encrypted tuples). Add delays. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

21 Modulo- private Based on formula,on by CaneG (FOCS 01). Let be the permissible data that can be revealed during honest execu,on. A provisioning protocol is modulo- private if, for any adversary A execuhng the protocol, there is an algorithm B with access only to, such that the output of A and B are indishnguishable. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

22 The permissible : size of input/output,,me of revoca,on/return of and under honest execu,on. shuffling F1 w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.

23 M 2 R is - modulo private. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

24 4. Implementa;ons & Experiments Use Xen as the trusted hypervisor, and its Verifiable Dynamic Func,on Executor to load and execute trusted codes. (The design of M 2 R can be implemented differently depending on the underlying architecture, e.g. on Intel SGX). Ported 7 MapReduc benchmark applica,ons. KMeans : Itera,ve, Compute intensive Grep : Compute intensive Pagerank : Itera,ve, Compute intensive WordCount : Shuffle intensive Index : Shuffle intensive Join : database queries Aggregate : database queries 8 compute nodes, each quad- core Intel CPU 1.8 GHz, 8GB RAM, 1GB Ethernet cards.

25 Trusted Code Base 4 trusted computa,on units: mixt, GroupT,,. Plaoorm related: (mixt, GroupT) Lines Of Code: 300 Applica,ons: (, ReduceT) Lines Of Code 200 for our examples. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

26 Performance Job Input size (bytes) (vs plaintext size) Shuffled bytes #Applica,ons hyper calls #plaoorm hyper calls Wordcount 2.1G (1.1 ) 4.2G Index 2.5G (1.2 ) 8G Grep 2.1G (1.1 ) 75M Aggregate 2.0G (1.2 ) 289M Join 2.0G (1.2 ) 450M Pagerank 2.5G (4.0 ) 2.6G KMeans 1.0G (1.1 ) 11K Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

27 Running ;me (s) Job Baseline (vs no encryp,on) M 2 R (vs baseline) Wordcount 570 (2.6 ) 1156 (2.0 ) Index 666 (1.6 ) 1549 (2.3 ) Grep 70 (1.5 ) 106 (1.5 ) Aggregate 125 (1.6 ) 205 (1.6 ) Join 422 (2 ) 510 (1.2 ) Pagerank 521 (1.6 ) 755 (1.4 ) KMeans 123 (1.7 ) 145 (1.2 ) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on

28 Conclusions Privacy- preserving distributed computa,on of MapReduce with trusted compu,ng. Security: Execu,on integrity +Data Confiden,ality Observa,on that simply running the map/reduce in trusted environment is not sufficient: interac,ons leak sensi,ve info. Small TCB Exploit the algorithmic structure to outperform a solu,on that employs generic ORAM. Future works: other distributed dataflow systems.

M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION

M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION 1 M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION TIEN TUAN ANH DINH, PRATEEK SAXENA, EE-CHIEN CHANG, BENG CHIN OOI, AND CHUNWANG ZHANG, NATIONAL UNIVERSITY OF SINGAPORE PRESENTED BY: RAVEEN

More information

Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute

Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute Hung Dang, Anh Dinh, Ee-Chien Chang, Beng Chin Ooi School of Computing National University of Singapore The Problem Context:

More information

M 2 R: Enabling Stronger Privacy in MapReduce Computation

M 2 R: Enabling Stronger Privacy in MapReduce Computation M 2 R: Enabling Stronger Privacy in MapReduce Computation Tien Tuan Anh Dinh, Prateek Saxena, Ee-Chien Chang, Beng Chin Ooi, and Chunwang Zhang, National University of Singapore https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/dinh

More information

Mylar. xd5d1db5abce2356d51db5aab23d abbce23352abc x435acb734352a12cad5d1db5abce2356d51db acb2312aaab23!

Mylar. xd5d1db5abce2356d51db5aab23d abbce23352abc x435acb734352a12cad5d1db5abce2356d51db acb2312aaab23! Mylar Building web applica/ons on top of encrypted data xd5d1db5abce2356d51db5aab23d5321535abbce23352abc4352314987 x435acb734352a12cad5d1db5abce2356d51db5345323acb2312aaab23! Raluca Ada Popa, Emily Stark,

More information

Op#mizing MapReduce for Highly- Distributed Environments

Op#mizing MapReduce for Highly- Distributed Environments Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn

More information

Collabora've, Privacy Preserving Data Aggrega'on at Scale

Collabora've, Privacy Preserving Data Aggrega'on at Scale Collabora've, Privacy Preserving Data Aggrega'on at Scale Michael J. Freedman Princeton University Joint work with: Benny Applebaum, Haakon Ringberg, MaHhew Caesar, and Jennifer Rexford Problem: Network

More information

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton

More information

Usable PIR. Network Security and Applied. Cryptography Laboratory.

Usable PIR. Network Security and Applied. Cryptography Laboratory. Network Security and Applied Cryptography Laboratory http://crypto.cs.stonybrook.edu Usable PIR NDSS '08, San Diego, CA Peter Williams petertw@cs.stonybrook.edu Radu Sion sion@cs.stonybrook.edu ver. 2.1

More information

Delegated Access for Hadoop Clusters in the Cloud

Delegated Access for Hadoop Clusters in the Cloud Delegated Access for Hadoop Clusters in the Cloud David Nuñez, Isaac Agudo, and Javier Lopez Network, Information and Computer Security Laboratory (NICS Lab) Universidad de Málaga, Spain Email: dnunez@lcc.uma.es

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core

More information

A Comparison Study of Intel SGX and AMD Memory Encryption Technology

A Comparison Study of Intel SGX and AMD Memory Encryption Technology A Comparison Study of Intel SGX and AMD Memory Encryption Technology Saeid Mofrad, Fengwei Zhang Shiyong Lu Wayne State University {saeid.mofrad, Fengwei, Shiyong}@wayne.edu Weidong Shi (Larry) University

More information

Secure Server Project. Xen Project Developer Summit 2013 Adven9um Labs Jason Sonnek

Secure Server Project. Xen Project Developer Summit 2013 Adven9um Labs Jason Sonnek Secure Server Project Xen Project Developer Summit 2013 Adven9um Labs Jason Sonnek 1 Outline I. Mo9va9on, Objec9ves II. Threat Landscape III. Design IV. Status V. Roadmap 2 Mo9va9on In a nutshell: Secure

More information

Secure Remote Storage Using Oblivious RAM

Secure Remote Storage Using Oblivious RAM Secure Remote Storage Using Oblivious RAM Giovanni Malloy Mentors: Georgios Kellaris, Kobbi Nissim August 11, 2016 Abstract Oblivious RAM (ORAM) is a protocol that allows a user to access the data she

More information

Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.

Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14. Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion

More information

Verifiable Cloud Outsourcing for Network Func9ons (+ Verifiable Resource Accoun9ng for Cloud Services)

Verifiable Cloud Outsourcing for Network Func9ons (+ Verifiable Resource Accoun9ng for Cloud Services) 1 Verifiable Cloud Outsourcing for Network Func9ons (+ Verifiable Resource Accoun9ng for Cloud Services) Vyas Sekar vnfo joint with Seyed Fayazbakhsh, Mike Reiter VRA joint with Chen Chen, Petros Mania9s,

More information

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:

More information

Securing Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al

Securing Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al Securing Hadoop Keys Botzum, MapR Technologies kbotzum@maprtech.com Jan 2014 MapR Technologies - Confiden6al 1 Why Secure Hadoop Historically security wasn t a high priority Reflec6on of the type of data

More information

Modular arithme.c and cryptography

Modular arithme.c and cryptography Modular arithme.c and cryptography CSC 1300 Discrete Structures Villanova University Public Key Cryptography (Slides 11-32) by Dr. Lillian Cassel, Villanova University Villanova CSC 1300 - Dr Papalaskari

More information

Project: Embedded SMC

Project: Embedded SMC Project: Embedded SMC What is Secure Computa1on [SMC] A Compute f(a, B) Without revealing A to Bob and B to Alice B 2 Using a Trusted Third Party A B f(a, B) f(a, B) A Compute f(a, B) Without revealing

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

Today s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325

Today s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325 Today s Objec4ves Virtualiza4on Cloud Compu4ng Amazon Web Services Oct 23, 2017 Sprenkle - CSCI325 1 Data Center What did you think? Oct 23, 2017 Sprenkle - CSCI325 2 1 10/23/17 Oct 23, 2017 Sprenkle -

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns CS 378 Fall 2017 Big Data Programming 1 Review Assignment 2 Ques9ons? mrunit How do you test map() or reduce() calls that produce mul9ple outputs?

More information

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

Hypergraph Sparsifica/on and Its Applica/on to Par//oning Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer

More information

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal

More information

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan INSTITUTE

More information

An Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras

An Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras CR An Introduc+on to Applied Cryptography Chester Rebeiro IIT Madras CR 2 Connected and Stored Everything is connected! Everything is stored! Increased Security Breaches 81% more in 2015 CR h9p://www.pwc.co.uk/assets/pdf/2015-isbs-execugve-

More information

Efficient Private Information Retrieval

Efficient Private Information Retrieval Efficient Private Information Retrieval K O N S T A N T I N O S F. N I K O L O P O U L O S T H E G R A D U A T E C E N T E R, C I T Y U N I V E R S I T Y O F N E W Y O R K K N I K O L O P O U L O S @ G

More information

Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM)

Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM) CSE 5095 & ECE 4451 & ECE 5451 Spring 2017 Lecture 7b Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM) Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen

More information

Hadoop Map Reduce 10/17/2018 1

Hadoop Map Reduce 10/17/2018 1 Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018

More information

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons. Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?

More information

Generalizing Map- Reduce

Generalizing Map- Reduce Generalizing Map- Reduce 1 Example: A Map- Reduce Graph map reduce map... reduce reduce map 2 Map- reduce is not a solu;on to every problem, not even every problem that profitably can use many compute

More information

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Resource and Performance Distribution Prediction for Large Scale Analytics Queries Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia

More information

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Privacy-Preserving Shortest Path Computa6on

Privacy-Preserving Shortest Path Computa6on Privacy-Preserving Shortest Path Computa6on David J. Wu, Joe Zimmerman, Jérémy Planul, and John C. Mitchell Stanford University Naviga6on desired des@na@on current posi@on Naviga6on: A Solved Problem?

More information

BUILDING SECURE (CLOUD) APPLICATIONS USING INTEL S SGX

BUILDING SECURE (CLOUD) APPLICATIONS USING INTEL S SGX BUILDING SECURE (CLOUD) APPLICATIONS USING INTEL S SGX FLORIAN KERSCHBAUM, UNIVERSITY OF WATERLOO JOINT WORK WITH BENNY FUHRY (SAP), ANDREAS FISCHER (SAP) AND MANY OTHERS DO YOU TRUST YOUR CLOUD SERVICE

More information

Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage

Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage Kevin Leach 1, Fengwei Zhang 2, and Westley Weimer 1 1 University of Michigan, 2 Wayne State University

More information

Lectures 6+7: Zero-Leakage Solutions

Lectures 6+7: Zero-Leakage Solutions Lectures 6+7: Zero-Leakage Solutions Contents 1 Overview 1 2 Oblivious RAM 1 3 Oblivious RAM via FHE 2 4 Oblivious RAM via Symmetric Encryption 4 4.1 Setup........................................ 5 4.2

More information

Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services

Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services Max Planck Institute for Software Systems Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services 1, Rodrigo Rodrigues 2, Krishna P. Gummadi 1, Stefan Saroiu 3 MPI-SWS 1, CITI / Universidade

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Lectures 4+5: The (In)Security of Encrypted Search

Lectures 4+5: The (In)Security of Encrypted Search Lectures 4+5: The (In)Security of Encrypted Search Contents 1 Overview 1 2 Data Structures 2 3 Syntax 3 4 Security 4 4.1 Formalizing Leaky Primitives.......................... 5 1 Overview In the first

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement

More information

MapReduce. Tom Anderson

MapReduce. Tom Anderson MapReduce Tom Anderson Last Time Difference between local state and knowledge about other node s local state Failures are endemic Communica?on costs ma@er Why Is DS So Hard? System design Par??oning of

More information

Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University

Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng Jakub Krzywda Umeå University How to use DVFS and CPU Pinning to lower the power consump1on during periods

More information

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh SSS: An Implementation of Key-value Store based MapReduce Framework Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh MapReduce A promising programming tool for implementing largescale

More information

Opera&ng Systems ECE344

Opera&ng Systems ECE344 Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Secure and Ef icient Iris and Fingerprint Identi ication

Secure and Ef icient Iris and Fingerprint Identi ication Secure and Eficient Iris and Fingerprint Identiication Marina Blanton Department of Computer Science and Engineering, University of Notre Dame mblanton@nd.edu Paolo Gas Department of Computer Science,

More information

R-Store: A Scalable Distributed System for Supporting Real-time Analytics

R-Store: A Scalable Distributed System for Supporting Real-time Analytics R-Store: A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer Ozsu, Gang Chen, Beng Chin Ooi National University of Singapore ICDE 2014 Background Situation for large scale

More information

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer

More information

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file

More information

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why

More information

Large- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015

Large- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015 Large- Scale Sor,ng: Breaking World Records Mike Conley CSE 124 Guest Lecture 12 March 2015 Sor,ng Given an array of items, put them in order 5 2 8 0 2 5 4 9 0 1 0 0 0 0 0 0 1 2 2 4 5 5 8 9 Many algorithms

More information

Con$nuous Audi$ng and Risk Management in Cloud Compu$ng

Con$nuous Audi$ng and Risk Management in Cloud Compu$ng Con$nuous Audi$ng and Risk Management in Cloud Compu$ng Marcus Spies Chair of Knowledge Management LMU University of Munich Scien$fic / Technical Director of EU Integrated Research Project MUSING Cloud

More information

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications Yongzhi Wang, Jinpeng Wei Florida International University Mudhakar Srivatsa IBM T.J. Watson Research Center

More information

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Differential Privacy

Differential Privacy CPSC 426/526 Differential Privacy Ennan Zhai Computer Science Department Yale University Recall: Lec-11 In lec-11, we learned: - Cryptographic basics - Symmetric key cryptography - Public key cryptography

More information

Analysis in the Big Data Era

Analysis in the Big Data Era Analysis in the Big Data Era Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 2 Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics Java / C++ / R /

More information

Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races. CS 563 Young Li 10/31/18

Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races. CS 563 Young Li 10/31/18 Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races CS 563 Young Li 10/31/18 Intel Software Guard extensions (SGX) and Hyper-Threading What is Intel SGX? Set of

More information

Distributed Data Management Summer Semester 2013 TU Kaiserslautern

Distributed Data Management Summer Semester 2013 TU Kaiserslautern Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de Distributed Data Management, SoSe 2013, S. Michel 1 Lecture 4 PIG/HIVE Distributed

More information

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks

OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015 The Tradi6onal

More information

Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper

Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper November 18, 2015 Super Compu3ng 2015 The Amount of Graph Data is Exploding! Billion+ Edges! 2 Graph Applications

More information

How to use the BigDataBench simulator versions

How to use the BigDataBench simulator versions How to use the BigDataBench simulator versions Zhen Jia Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY Objec8ves

More information

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Workload Characterization and Optimization of TPC-H Queries on Apache Spark Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

Re- op&mizing Data Parallel Compu&ng

Re- op&mizing Data Parallel Compu&ng Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel

More information

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering

More information

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia.

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia. W1005 Intro to CS and Programming in MATLAB Brief History of Compu?ng Fall 2014 Instructor: Ilia Vovsha hip://www.cs.columbia.edu/~vovsha/w1005 Computer Philosophy Computer is a (electronic digital) device

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

CIS 4360 Secure Computer Systems SGX

CIS 4360 Secure Computer Systems SGX CIS 4360 Secure Computer Systems SGX Professor Qiang Zeng Spring 2017 Some slides are stolen from Intel docs Previous Class UEFI Secure Boot Windows s Trusted Boot Intel s Trusted Boot CIS 4360 Secure

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 11 more on Data Organiza:on Pa;erns CS 378 - Fall 2016 Big Data Programming 1 Assignment 5 - Review Define an Avro object for user session One user session for each

More information

Organized Segmenta.on

Organized Segmenta.on Organized Segmenta.on Alex Trevor, Georgia Ins.tute of Technology PCL TUTORIAL @ICRA 13 Overview Mo.va.on Connected Component Algorithm Planar Segmenta.on & Refinement Euclidean Clustering Timing Results

More information

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)! Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm

More information

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland

Research Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland Research Works to Cope with Big Data Volume and Variety Jiaheng Lu University of Helsinki, Finland Big Data: 4Vs Photo downloaded from: https://blog.infodiagram.com/2014/04/visualizing-big-data-concepts-strong.html

More information

Making Searchable Encryption Scale to the Cloud. Ian Miers and Payman Mohassel

Making Searchable Encryption Scale to the Cloud. Ian Miers and Payman Mohassel Making Searchable Encryption Scale to the Cloud Ian Miers and Payman Mohassel End to end Encryption No encryption Transport encryption End2End Encryption Service provider Service provider Service provider

More information

Design and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas

Design and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas Design and Implementation of the Ascend Secure Processor Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas Agenda Motivation Ascend Overview ORAM for obfuscation Ascend:

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Design Principles & Prac4ces

Design Principles & Prac4ces Design Principles & Prac4ces Robert France Robert B. France 1 Understanding complexity Accidental versus Essen4al complexity Essen%al complexity: Complexity that is inherent in the problem or the solu4on

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Searchable Encryption Using ORAM. Benny Pinkas

Searchable Encryption Using ORAM. Benny Pinkas Searchable Encryption Using ORAM Benny Pinkas 1 Desiderata for Searchable Encryption Security No leakage about the query or the results Functionality Variety of queries that are supported Performance 2

More information

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since

More information

Secure Distributed Computa2ons (and their Proofs)

Secure Distributed Computa2ons (and their Proofs) Secure Distributed Computa2ons (and their Proofs) Pedro Adao Gilles Barthe Ricardo Corin Pierre- Malo Deniélou Gurvan le Guernic Nataliya Guts Eugen Zalinescu San2ago Zanella Béguelin Karthik Bhargavan

More information

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay Hardware and Architectural Support for Security and Privacy (HASP 18), June 2, 2018, Los Angeles, CA, USA Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay Computing and Engineering (SCSE) Nanyang Technological

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements

An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI.9/TPDS.6.7, IEEE

More information

Graphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij

Graphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij Graphene-SGX A Practical Library OS for Unmodified Applications on SGX Chia-Che Tsai Donald E. Porter Mona Vij Intel SGX: Trusted Execution on Untrusted Hosts Processing Sensitive Data (Ex: Medical Records)

More information

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk

More information

15/03/2018. Counters

15/03/2018. Counters Counters 2 1 Hadoop provides a set of basic, built-in, counters to store some statistics about jobs, mappers, reducers E.g., number of input and output records E.g., number of transmitted bytes Ad-hoc,

More information