M 2 R: Enabling Stronger Privacy in MapReduce Computa;on
|
|
- Erika Sharp
- 6 years ago
- Views:
Transcription
1 M 2 R: Enabling Stronger Privacy in MapReduce Computa;on Anh Dinh, Prateek Saxena, Ee- Chien Chang, Beng Chin Ooi, Chunwang Zhang School of Compu,ng Na,onal University of Singapore
2 1. Mo;va;on Distributed computa,on (MapReduce) on large dataset with Trusted compu,ng. T T Storage T Integrity + Confiden,ality. Applicable in private or public cloud segng.
3 Challenge 1: Keep Trusted Code Base small Applica,on Frameworks Opera,ng Systems CVEs in Linux [CVE- DB] Hypervisor Affected many hypervisors (e.g Xen / KVM) [CS Report]
4 Baseline System Protected Code / Data TCB (e.g. SGX enclave) Opera,ng System Hypervisor MapReduce/Hadoop Framework S t o r a g e Trusted CPU All data outside trusted environment is encrypted SoZware only a[ack. Previous system: VC 3 (IEEE S&P 2015)
5 Background: MapReduce shuffling map reduce reduce map reduce map reduce Computa,on & shuffling of <key, value> tuples. Phases: Map à Shuffle à Reduce. map outputs a set of tuples. During Shuffling, tuples are grouped according to their key. Each reduce instance corresponds to an unique key k. It takes all tuples with the key k and output a set of tuples.
6 Background: Hadoop Hadoop: sozware framework wri[en in Java 190K LOC (Hadoop ) Consists of MapReduce modules, Hadoop Distributed File System (HDFS), etc. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
7 Is the Baseline System sufficient? T T Storage T Iden,fy small essen,al components of MapReduce to be included in the TCB.
8 Challenge 2: Interac;ons Leaks Info T T Storage T Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
9 Example of leakage: wordcount Map Phase: each generates the tuples. F1 w1 w2 shuffling w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 w1 w2 w4 (key: w3) F3 w3 w4 w3 w4 (key:w4)
10 Example of leakage: wordcount Shuffling Phase: The tuples are grouped w.r.t the words. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Reduce Phase: counts and outputs the number of tuples it received.
11 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) F3 contains a unique word
12 Example of leakage: word counts By observing the flow of tuples, one can infer rela,onships among the input files. F1 w1 w2 shuffling (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.
13 Possible solu;on: Oblivious RAM Very high overhead. ORAM Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
14 2. Our solu;on Randomly permutes the tuples. Group the tuples according to their keys. Secure Mixing (MixT) Grouping
15 2. Our solu;on For execu,on integrity, addi,on step of verifica,on is required. Secure Mixing (MixT) Grouping Verify groupings (GroupT)
16 2. Our solu;on Randomly permutes the tuples Original untrusted Shuffling process Verifies grouping is correctly done Secure Mixing (MixT) Grouping Verify groupings (GroupT)
17 Cascaded Mixing A cascaded mixing is employed to randomly permute the tuples distributedly. mixt mixt mixt mixt mixt mixt mixt mixt mixt round 1 round 2 round 3
18 Remarks Key management, handling of the random nonce and ini,al value is not straighoorward. In Hadoop, mul,ple reduce instances are carried out by a single reducer. Likewise mapper. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
19 ORAM vs Our solu;on M 2 R exploits the fact that, reads and writes can be batched into 2 phases, whereas ORAM caters for single read/write and thus incurs higher overhead. Many construc,ons of ORAM need to permute or o- sort the data. ORAM Secure Mixing (MixT) Grouping Verify groupings (GroupT) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
20 3. Security Model Adversary can observe the following: Input/output size of each trusted instance. Source/des,na,on of the input/output. Time of invoca,on/return of each trusted instance. Ac,ve adversary can: Arbitrary Invoke trusted instances. Halt instances. Drop/duplicate ciphertext (encrypted tuples). Add delays. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
21 Modulo- private Based on formula,on by CaneG (FOCS 01). Let be the permissible data that can be revealed during honest execu,on. A provisioning protocol is modulo- private if, for any adversary A execuhng the protocol, there is an algorithm B with access only to, such that the output of A and B are indishnguishable. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
22 The permissible : size of input/output,,me of revoca,on/return of and under honest execu,on. shuffling F1 w1 w2 (key: w1) (key: w2) F2 w1 w2 w4 (key: w3) F3 w3 w4 (key:w4) Goal: hide these rela;onships.
23 M 2 R is - modulo private. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
24 4. Implementa;ons & Experiments Use Xen as the trusted hypervisor, and its Verifiable Dynamic Func,on Executor to load and execute trusted codes. (The design of M 2 R can be implemented differently depending on the underlying architecture, e.g. on Intel SGX). Ported 7 MapReduc benchmark applica,ons. KMeans : Itera,ve, Compute intensive Grep : Compute intensive Pagerank : Itera,ve, Compute intensive WordCount : Shuffle intensive Index : Shuffle intensive Join : database queries Aggregate : database queries 8 compute nodes, each quad- core Intel CPU 1.8 GHz, 8GB RAM, 1GB Ethernet cards.
25 Trusted Code Base 4 trusted computa,on units: mixt, GroupT,,. Plaoorm related: (mixt, GroupT) Lines Of Code: 300 Applica,ons: (, ReduceT) Lines Of Code 200 for our examples. Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
26 Performance Job Input size (bytes) (vs plaintext size) Shuffled bytes #Applica,ons hyper calls #plaoorm hyper calls Wordcount 2.1G (1.1 ) 4.2G Index 2.5G (1.2 ) 8G Grep 2.1G (1.1 ) 75M Aggregate 2.0G (1.2 ) 289M Join 2.0G (1.2 ) 450M Pagerank 2.5G (4.0 ) 2.6G KMeans 1.0G (1.1 ) 11K Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
27 Running ;me (s) Job Baseline (vs no encryp,on) M 2 R (vs baseline) Wordcount 570 (2.6 ) 1156 (2.0 ) Index 666 (1.6 ) 1549 (2.3 ) Grep 70 (1.5 ) 106 (1.5 ) Aggregate 125 (1.6 ) 205 (1.6 ) Join 422 (2 ) 510 (1.2 ) Pagerank 521 (1.6 ) 755 (1.4 ) KMeans 123 (1.7 ) 145 (1.2 ) Usensix Security 2015 M2R: Enabling Stronger Privacy in MapReduce Computa,on
28 Conclusions Privacy- preserving distributed computa,on of MapReduce with trusted compu,ng. Security: Execu,on integrity +Data Confiden,ality Observa,on that simply running the map/reduce in trusted environment is not sufficient: interac,ons leak sensi,ve info. Small TCB Exploit the algorithmic structure to outperform a solu,on that employs generic ORAM. Future works: other distributed dataflow systems.
M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION
1 M 2 R: ENABLING STRONGER PRIVACY IN MAPREDUCE COMPUTATION TIEN TUAN ANH DINH, PRATEEK SAXENA, EE-CHIEN CHANG, BENG CHIN OOI, AND CHUNWANG ZHANG, NATIONAL UNIVERSITY OF SINGAPORE PRESENTED BY: RAVEEN
More informationPrivacy-Preserving Computation with Trusted Computing via Scramble-then-Compute
Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute Hung Dang, Anh Dinh, Ee-Chien Chang, Beng Chin Ooi School of Computing National University of Singapore The Problem Context:
More informationM 2 R: Enabling Stronger Privacy in MapReduce Computation
M 2 R: Enabling Stronger Privacy in MapReduce Computation Tien Tuan Anh Dinh, Prateek Saxena, Ee-Chien Chang, Beng Chin Ooi, and Chunwang Zhang, National University of Singapore https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/dinh
More informationMylar. xd5d1db5abce2356d51db5aab23d abbce23352abc x435acb734352a12cad5d1db5abce2356d51db acb2312aaab23!
Mylar Building web applica/ons on top of encrypted data xd5d1db5abce2356d51db5aab23d5321535abbce23352abc4352314987 x435acb734352a12cad5d1db5abce2356d51db5345323acb2312aaab23! Raluca Ada Popa, Emily Stark,
More informationOp#mizing MapReduce for Highly- Distributed Environments
Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big
More informationMapReduce, Apache Hadoop
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University
More informationMapReduce, Apache Hadoop
Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn
More informationCollabora've, Privacy Preserving Data Aggrega'on at Scale
Collabora've, Privacy Preserving Data Aggrega'on at Scale Michael J. Freedman Princeton University Joint work with: Benny Applebaum, Haakon Ringberg, MaHhew Caesar, and Jennifer Rexford Problem: Network
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationUsable PIR. Network Security and Applied. Cryptography Laboratory.
Network Security and Applied Cryptography Laboratory http://crypto.cs.stonybrook.edu Usable PIR NDSS '08, San Diego, CA Peter Williams petertw@cs.stonybrook.edu Radu Sion sion@cs.stonybrook.edu ver. 2.1
More informationDelegated Access for Hadoop Clusters in the Cloud
Delegated Access for Hadoop Clusters in the Cloud David Nuñez, Isaac Agudo, and Javier Lopez Network, Information and Computer Security Laboratory (NICS Lab) Universidad de Málaga, Spain Email: dnunez@lcc.uma.es
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationImproving Hadoop MapReduce Performance on Supercomputers with JVM Reuse
Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core
More informationA Comparison Study of Intel SGX and AMD Memory Encryption Technology
A Comparison Study of Intel SGX and AMD Memory Encryption Technology Saeid Mofrad, Fengwei Zhang Shiyong Lu Wayne State University {saeid.mofrad, Fengwei, Shiyong}@wayne.edu Weidong Shi (Larry) University
More informationSecure Server Project. Xen Project Developer Summit 2013 Adven9um Labs Jason Sonnek
Secure Server Project Xen Project Developer Summit 2013 Adven9um Labs Jason Sonnek 1 Outline I. Mo9va9on, Objec9ves II. Threat Landscape III. Design IV. Status V. Roadmap 2 Mo9va9on In a nutshell: Secure
More informationSecure Remote Storage Using Oblivious RAM
Secure Remote Storage Using Oblivious RAM Giovanni Malloy Mentors: Georgios Kellaris, Kobbi Nissim August 11, 2016 Abstract Oblivious RAM (ORAM) is a protocol that allows a user to access the data she
More informationGuoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.
Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion
More informationSoK: A Study of Using Hardwareassisted. Environments for Security. Fengwei Zhang and Hongwei Zhang. Wayne State University Detroit, Michigan, USA
SoK: A Study of Using Hardwareassisted Isolated Execu
More informationVerifiable Cloud Outsourcing for Network Func9ons (+ Verifiable Resource Accoun9ng for Cloud Services)
1 Verifiable Cloud Outsourcing for Network Func9ons (+ Verifiable Resource Accoun9ng for Cloud Services) Vyas Sekar vnfo joint with Seyed Fayazbakhsh, Mike Reiter VRA joint with Chen Chen, Petros Mania9s,
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationSecuring Hadoop. Keys Botzum, MapR Technologies Jan MapR Technologies - Confiden6al
Securing Hadoop Keys Botzum, MapR Technologies kbotzum@maprtech.com Jan 2014 MapR Technologies - Confiden6al 1 Why Secure Hadoop Historically security wasn t a high priority Reflec6on of the type of data
More informationModular arithme.c and cryptography
Modular arithme.c and cryptography CSC 1300 Discrete Structures Villanova University Public Key Cryptography (Slides 11-32) by Dr. Lillian Cassel, Villanova University Villanova CSC 1300 - Dr Papalaskari
More informationProject: Embedded SMC
Project: Embedded SMC What is Secure Computa1on [SMC] A Compute f(a, B) Without revealing A to Bob and B to Alice B 2 Using a Trusted Third Party A B f(a, B) f(a, B) A Compute f(a, B) Without revealing
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationToday s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325
Today s Objec4ves Virtualiza4on Cloud Compu4ng Amazon Web Services Oct 23, 2017 Sprenkle - CSCI325 1 Data Center What did you think? Oct 23, 2017 Sprenkle - CSCI325 2 1 10/23/17 Oct 23, 2017 Sprenkle -
More informationCS 378 Big Data Programming
CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns CS 378 Fall 2017 Big Data Programming 1 Review Assignment 2 Ques9ons? mrunit How do you test map() or reduce() calls that produce mul9ple outputs?
More informationHypergraph Sparsifica/on and Its Applica/on to Par//oning
Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationMapReduce Design Patterns
MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationBigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite
BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan INSTITUTE
More informationAn Introduc+on to Applied Cryptography. Chester Rebeiro IIT Madras
CR An Introduc+on to Applied Cryptography Chester Rebeiro IIT Madras CR 2 Connected and Stored Everything is connected! Everything is stored! Increased Security Breaches 81% more in 2015 CR h9p://www.pwc.co.uk/assets/pdf/2015-isbs-execugve-
More informationEfficient Private Information Retrieval
Efficient Private Information Retrieval K O N S T A N T I N O S F. N I K O L O P O U L O S T H E G R A D U A T E C E N T E R, C I T Y U N I V E R S I T Y O F N E W Y O R K K N I K O L O P O U L O S @ G
More informationAscend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM)
CSE 5095 & ECE 4451 & ECE 5451 Spring 2017 Lecture 7b Ascend: Architecture for Secure Computation on Encrypted Data Oblivious RAM (ORAM) Marten van Dijk Syed Kamran Haider, Chenglu Jin, Phuong Ha Nguyen
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationVirtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.
Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?
More informationGeneralizing Map- Reduce
Generalizing Map- Reduce 1 Example: A Map- Reduce Graph map reduce map... reduce reduce map 2 Map- reduce is not a solu;on to every problem, not even every problem that profitably can use many compute
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationCAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationPrivacy-Preserving Shortest Path Computa6on
Privacy-Preserving Shortest Path Computa6on David J. Wu, Joe Zimmerman, Jérémy Planul, and John C. Mitchell Stanford University Naviga6on desired des@na@on current posi@on Naviga6on: A Solved Problem?
More informationBUILDING SECURE (CLOUD) APPLICATIONS USING INTEL S SGX
BUILDING SECURE (CLOUD) APPLICATIONS USING INTEL S SGX FLORIAN KERSCHBAUM, UNIVERSITY OF WATERLOO JOINT WORK WITH BENNY FUHRY (SAP), ANDREAS FISCHER (SAP) AND MANY OTHERS DO YOU TRUST YOUR CLOUD SERVICE
More informationScotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage
Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage Kevin Leach 1, Fengwei Zhang 2, and Westley Weimer 1 1 University of Michigan, 2 Wayne State University
More informationLectures 6+7: Zero-Leakage Solutions
Lectures 6+7: Zero-Leakage Solutions Contents 1 Overview 1 2 Oblivious RAM 1 3 Oblivious RAM via FHE 2 4 Oblivious RAM via Symmetric Encryption 4 4.1 Setup........................................ 5 4.2
More informationPolicy-Sealed Data: A New Abstraction for Building Trusted Cloud Services
Max Planck Institute for Software Systems Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services 1, Rodrigo Rodrigues 2, Krishna P. Gummadi 1, Stefan Saroiu 3 MPI-SWS 1, CITI / Universidade
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationLectures 4+5: The (In)Security of Encrypted Search
Lectures 4+5: The (In)Security of Encrypted Search Contents 1 Overview 1 2 Data Structures 2 3 Syntax 3 4 Security 4 4.1 Formalizing Leaky Primitives.......................... 5 1 Overview In the first
More informationTI2736-B Big Data Processing. Claudia Hauff
TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement
More informationMapReduce. Tom Anderson
MapReduce Tom Anderson Last Time Difference between local state and knowledge about other node s local state Failures are endemic Communica?on costs ma@er Why Is DS So Hard? System design Par??oning of
More informationUsing Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University
Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng Jakub Krzywda Umeå University How to use DVFS and CPU Pinning to lower the power consump1on during periods
More informationSSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh
SSS: An Implementation of Key-value Store based MapReduce Framework Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh MapReduce A promising programming tool for implementing largescale
More informationOpera&ng Systems ECE344
Opera&ng Systems ECE344 Lecture 10: Scheduling Ding Yuan Scheduling Overview In discussing process management and synchroniza&on, we talked about context switching among processes/threads on the ready
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationSecure and Ef icient Iris and Fingerprint Identi ication
Secure and Eficient Iris and Fingerprint Identiication Marina Blanton Department of Computer Science and Engineering, University of Notre Dame mblanton@nd.edu Paolo Gas Department of Computer Science,
More informationR-Store: A Scalable Distributed System for Supporting Real-time Analytics
R-Store: A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer Ozsu, Gang Chen, Beng Chin Ooi National University of Singapore ICDE 2014 Background Situation for large scale
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationLarge- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015
Large- Scale Sor,ng: Breaking World Records Mike Conley CSE 124 Guest Lecture 12 March 2015 Sor,ng Given an array of items, put them in order 5 2 8 0 2 5 4 9 0 1 0 0 0 0 0 0 1 2 2 4 5 5 8 9 Many algorithms
More informationCon$nuous Audi$ng and Risk Management in Cloud Compu$ng
Con$nuous Audi$ng and Risk Management in Cloud Compu$ng Marcus Spies Chair of Knowledge Management LMU University of Munich Scien$fic / Technical Director of EU Integrated Research Project MUSING Cloud
More informationIntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management
IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications Yongzhi Wang, Jinpeng Wei Florida International University Mudhakar Srivatsa IBM T.J. Watson Research Center
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationDifferential Privacy
CPSC 426/526 Differential Privacy Ennan Zhai Computer Science Department Yale University Recall: Lec-11 In lec-11, we learned: - Cryptographic basics - Symmetric key cryptography - Public key cryptography
More informationAnalysis in the Big Data Era
Analysis in the Big Data Era Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 2 Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics Java / C++ / R /
More informationRacing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races. CS 563 Young Li 10/31/18
Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races CS 563 Young Li 10/31/18 Intel Software Guard extensions (SGX) and Hyper-Threading What is Intel SGX? Set of
More informationDistributed Data Management Summer Semester 2013 TU Kaiserslautern
Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de Distributed Data Management, SoSe 2013, S. Michel 1 Lecture 4 PIG/HIVE Distributed
More informationOLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks
OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015 The Tradi6onal
More informationOvercoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper
Overcoming the Barriers of Graphs on GPUs: Delivering Graph Analy;cs 100X Faster and 40X Cheaper November 18, 2015 Super Compu3ng 2015 The Amount of Graph Data is Exploding! Billion+ Edges! 2 Graph Applications
More informationHow to use the BigDataBench simulator versions
How to use the BigDataBench simulator versions Zhen Jia Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY Objec8ves
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationRe- op&mizing Data Parallel Compu&ng
Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel
More informationCombinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons
Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering
More informationW1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia.
W1005 Intro to CS and Programming in MATLAB Brief History of Compu?ng Fall 2014 Instructor: Ilia Vovsha hip://www.cs.columbia.edu/~vovsha/w1005 Computer Philosophy Computer is a (electronic digital) device
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationCIS 4360 Secure Computer Systems SGX
CIS 4360 Secure Computer Systems SGX Professor Qiang Zeng Spring 2017 Some slides are stolen from Intel docs Previous Class UEFI Secure Boot Windows s Trusted Boot Intel s Trusted Boot CIS 4360 Secure
More informationCS 378 Big Data Programming
CS 378 Big Data Programming Lecture 11 more on Data Organiza:on Pa;erns CS 378 - Fall 2016 Big Data Programming 1 Assignment 5 - Review Define an Avro object for user session One user session for each
More informationOrganized Segmenta.on
Organized Segmenta.on Alex Trevor, Georgia Ins.tute of Technology PCL TUTORIAL @ICRA 13 Overview Mo.va.on Connected Component Algorithm Planar Segmenta.on & Refinement Euclidean Clustering Timing Results
More informationLecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!
Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm
More informationResearch Works to Cope with Big Data Volume and Variety. Jiaheng Lu University of Helsinki, Finland
Research Works to Cope with Big Data Volume and Variety Jiaheng Lu University of Helsinki, Finland Big Data: 4Vs Photo downloaded from: https://blog.infodiagram.com/2014/04/visualizing-big-data-concepts-strong.html
More informationMaking Searchable Encryption Scale to the Cloud. Ian Miers and Payman Mohassel
Making Searchable Encryption Scale to the Cloud Ian Miers and Payman Mohassel End to end Encryption No encryption Transport encryption End2End Encryption Service provider Service provider Service provider
More informationDesign and Implementation of the Ascend Secure Processor. Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas
Design and Implementation of the Ascend Secure Processor Ling Ren, Christopher W. Fletcher, Albert Kwon, Marten van Dijk, Srinivas Devadas Agenda Motivation Ascend Overview ORAM for obfuscation Ascend:
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationDesign Principles & Prac4ces
Design Principles & Prac4ces Robert France Robert B. France 1 Understanding complexity Accidental versus Essen4al complexity Essen%al complexity: Complexity that is inherent in the problem or the solu4on
More informationWhere We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationSearchable Encryption Using ORAM. Benny Pinkas
Searchable Encryption Using ORAM Benny Pinkas 1 Desiderata for Searchable Encryption Security No leakage about the query or the results Functionality Variety of queries that are supported Performance 2
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationSecure Distributed Computa2ons (and their Proofs)
Secure Distributed Computa2ons (and their Proofs) Pedro Adao Gilles Barthe Ricardo Corin Pierre- Malo Deniélou Gurvan le Guernic Nataliya Guts Eugen Zalinescu San2ago Zanella Béguelin Karthik Bhargavan
More informationMing Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay
Hardware and Architectural Support for Security and Privacy (HASP 18), June 2, 2018, Los Angeles, CA, USA Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay Computing and Engineering (SCSE) Nanyang Technological
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationLocality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationAn Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI.9/TPDS.6.7, IEEE
More informationGraphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij
Graphene-SGX A Practical Library OS for Unmodified Applications on SGX Chia-Che Tsai Donald E. Porter Mona Vij Intel SGX: Trusted Execution on Untrusted Hosts Processing Sensitive Data (Ex: Medical Records)
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More information15/03/2018. Counters
Counters 2 1 Hadoop provides a set of basic, built-in, counters to store some statistics about jobs, mappers, reducers E.g., number of input and output records E.g., number of transmitted bytes Ad-hoc,
More information