MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering and Computer Science (Presenter) Washington State University, Vancouver The 4th Workshop on Many-Task Computing on Grids and Supercomputers: MTAGS 11
The Problem Today s applications are increasingly data and compute intensive 2
The Problem Today s applications are increasingly data and compute intensive Many-Task Computing paradigms becoming pervasive: WF, MR 2
The Problem Today s applications are increasingly data and compute intensive Many-Task Computing paradigms becoming pervasive: WF, MR E.g., Map-Reducible applications are solving common problems Data mining Graph processing etc. 2
Clouds Infrastructure-as-a-Service Anyone, anywhere can allocate unlimited virtualized compute/storage resources Amazon Web Services: Most popular IaaS provider Elastic Compute Cloud (EC2) Simple Storage Service (S3) 3
Amazon Web Services: EC2 On-Demand Instances (Virtual Machines) Types: Extra Large, Large, Small, etc. For example, Extra Large Instance: Cost: $0.68 per allocated-hour 15GB memory; 1.7TB disk (ephemeral) 8 Compute Units High I/O performance 4
Amazon Web Services: S3 Accessible anywhere. High reliability and availability Objects are arbitrary data blobs Objects stored as (key,value) in Buckets 5TB of data per object Unlimited objects per bucket 5
Amazon Web Services: S3 (Cont.) Simple FTP-like interface using web service protocol Put, Get (Partial Get), and Delete SOAP and REST High throughput (~40MB/sec) Scales well to multiple clients Low costs 6
Amazon Web Services: S3 (Cont.) 449 billion objects in S3 as of July 2011 Doubling each year 7
Motivation and Focus Virtualization is characteristic of any cloud environment: Clouds are black boxes Storage and elastic compute services exhibit performance variabilities we should leverage 8
Goals? As users are increasingly moving to cloud-based solutions for computing. We have a need for services and tools that can Get the most out of cloud resources for dataintensive processing Provide a simple programming interface 9
Outline Background System Overview Experiments Conclusion 10
MATE-EC2 System Design Cloud middleware that is able to Use a set of possibly heterogeneous EC2 instances to scalably and efficiently process data stored in S3 MATE is a generalized reduction PDC structure like Map-Reduce 11
MATE and Map-Reduce Map-Reduce input data map() shuffle reduce() result (k1,v) (k1,v) (k3,v) (k1,v) (k1,v) (k2,v) (k2,v) (k2,v) (k1,v') (k2,v') (k3,v') 12
MATE and Map-Reduce Map-Reduce MATE input data map() shuffle reduce() result input data local reduction() global reduction() result (k1,v) (k1,v) (k3,v) (k1,v) (k1,v) (k2,v) (k2,v) (k2,v) (k1,v') (k2,v') (k3,v') proc(e) (k1,v') (k3,v') robj 1 (k1,v') (k1,v') robj 1 robj K (k1,v'') (k2,v'') (k3,v'') combined robj 12
MATE and Map-Reduce Map-Reduce MATE input data map() shuffle reduce() result input data local reduction() global reduction() result (k1,v) (k1,v) (k3,v) (k1,v) (k1,v) (k2,v) (k2,v) (k2,v) (k1,v') (k2,v') (k3,v') proc(e) (k1,v') (k3,v') robj 1 (k1,v') (k1,v') robj 1 robj K (k1,v'') (k2,v'') (k3,v'') combined robj 12
MATE and Map-Reduce Map-Reduce MATE input data map() shuffle reduce() result input data local reduction() global reduction() result (k1,v) (k1,v) (k3,v) (k1,v) (k1,v) (k2,v) (k2,v) (k2,v) (k1,v') (k2,v') (k3,v') proc(e) (k1,v') (k3,v') robj 1 (k1,v') (k1,v') robj 1 robj K (k1,v'') (k2,v'') (k3,v'') combined robj 12
MATE-EC2 Design S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance 13
MATE-EC2 Design: Data Organization Objects: Physical representation of the data in S3 S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer Buffer input Computing Layer Th 0 Th 1 Th 2 Chunks: Logical data partitions within objects (exploits memory utilization) Th 3 S3 Data Object Chunk 0 Chunk n Job Scheduler S3 Data Store Metadata EC2 Slave Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata Units: Fixed data units within a chunk for retrieval EC2 (exploits Slave concurrency) Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance 14
MATE-EC2 Design: Data Organization S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance Metadata: chunk offset, chunk size, unit size 14
MATE-EC2 Design: Data Retrieval S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance Threaded chunk retrieval: Chunk retrieved concurrently with a number of threads EC2 Master Instance 15
Dynamic Load Balancing S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance Job Pool 16
Dynamic Load Balancing S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance Schedule for processing Simple greedy heuristic: Select a unit belonging to the least connected chunk Job Pool 17
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (1) Compute node requests a job from Master
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (2) Chunk retrieved in units
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (3) Pass to Compute Layer, and process
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (3) Pass to Compute Layer, and process
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (3) Pass to Compute Layer, and process
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (3) Pass to Compute Layer, and process
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (3) Pass to Compute Layer, and process
MATE-EC2 Processing Flow S3 Data Retrieval Layer S3 Data Object Buffer Th 0 Th 1 Th 2 Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance (4) Request another job from Master
MATE-EC2 Processing Flow Slave Instance S3 Data Retrieval Layer Buffer Th 0 Th 1 Th 2 S3 Data Object Chunk 0 Chunk n S3 Data Store input Th 3 Computing Layer Job Scheduler Metadata EC2 Slave Instance EC2 Master Instance Slave Instance Slave Instance
Experiments Goals Finding the most suitable setting for AWS Performance of MATE-EC2 on heterogeneous and homogeneous compute environments Performance comparison of MATE-EC2 and Map- Reduce 23
Experiments (Cont.) Setup: 4 Large EC2 slave instances 1 Large instance for master instance For each application, the dataset is split into16 data objects on S3 Large Instance: 4 compute units (each comparable to 1.0-1.2GHz) 7.5GB (memory) 850GB (disk, ephemeral) High I/O 24
Experiments (Cont.) App I/O Comp RObj Size Dataset KMeans Clustering Low/Med Med/High Small 8.2GB 10.7 billion points PCA Low High Large 8.2GB PageRank High Low/Med Very Large 1GB 9.6M nodes, 131M edges 25
Effect of Chunk Sizes (KMeans) 1 Data Retrieval Thread Performance increase: 128KB vs. >8M 2.07x to 2.49x speedup 26
Data Retrieval (KMeans) 128M Chunk Size 16 Data Retrieval Threads 350 Execution Time (sec) 300 250 200 150 100 1 2 4 8 16 32 S3 Data Retrieval Threads Execution Time (sec) 220 200 180 160 140 120 100 8M 16M 32M 64M 128M Chunk Size One Thread vs. others: 1.37x - 1.90x 8M vs. others speedup: 1.13x - 1.30x 27
Job Assignment (KMeans) Execution Time (sec) 230 220 210 200 190 180 Sequential Selective Speedup: 1.01x for 8M 1.1x to 1.14x for others 8M 16M 32M 64M 128M Chunk Size 28
MATE-EC2 vs. Elastic MapReduce Chunk Size: 128MB Data retrieval Threads: 16 KMeans PageRank Execution Time (sec) 2000 1500 1000 500 EMR-no-combine EMR-combine MATE-EC2 Execution Time (sec) 1400 1200 1000 800 600 400 200 EMR-no-combine EMR-combine MATE-EC2 0 8 16 32 Elastic Compute Units 0 8 16 32 Elastic Compute Units Speedups vs. EMR-combine 3.54x to 4.58x Speedups vs. EMR-combine 4.08x to 7.54x 29
MATE-EC2 on Heterogeneous Instances Overheads KMeans: 1% PCA: 1.1%, 7.4%, 11.7% 30
In Conclusion AWS environment is explored for data-intensive computing 64M and 128M data chunks w/ 16 data retrieval threads seems to be optimal for our middleware Our data retrieval and processing optimizations significantly improve the performance of our middleware MATE-EC2 outperforms MR both in scalability and performance 31
Thank You Questions & Discussion? Acknowledgments: We want to thank the MTAGS 11 organizers and the reviewers for their constructive comments This work was sponsored by: 32