BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Size: px

Start display at page:

Download "BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE"

Madlyn Ashlie Gilmore
5 years ago
Views:

1 BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014

2 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest Benefit: Data development velocity Reciprocal Impact: Faster application development

3 WHAT WE RE NOT LOOKING AT TODAY Streaming Technologies In-memory Technologies

4 HADOOP 1.0 SYSTEMS ARCHITECTURE ASSUMPTIONS

5 HADOOP 1.0 SYSTEMS ARCHITECTURE ASSUMPTIONS Map/Reduce Abstracts storage, concurrency, execution HDFS Distributed, fault-tolerant filesystem Primarily designed for cost/scale Not POSIX compliant Works on commodity hardware Files are large (GBs to TBs) and append-only Access is large and sequential Hardware failure is common Fault-tolerance baked in Replicate data 3x Incrementally re-execute computation Avoid single points of failure

6 THE HADOOP SYSTEMS ARCHITECTURE PROBLEM

7 THE HADOOP PROBLEM - SYSTEMS ARCHITECTURE VIEW Technical View: Hadoop is a giant I/O platform I/O access fallen behind CPU/Memory density Strategy to address I/O vs processing divergence: Read/Write to as many drives in parallel! Related variable: Increase in spindle count drives additional network traffic (between nodes) Bounded by latency from read/write to disk (in addition to bandwidth)

8 THE HADOOP PROBLEM - SYSTEMS ARCHITECTURE VIEW Technical View (cont.): Increased number of disk read/writes has reciprocal impact on network bandwidth Teragen is a method for synthetic testing of network capacity Generates 3-9x the network load over normal operations Direct relationship between number of drives per node and number of MapReduce slots for that node Business View: Greater the spindle count, the lower the cost per TB Generally more average nodes are better than super nodes Consider data protection an additional consideration

9 THE HADOOP PROBLEM CPU CPU Performance Typically, CPU clock speed does not impact processing times Typically CPU is not a performance bottleneck (there are exceptions) Heuristics on CPU: No negative impact for running more and or higher quality CPU s Price and power consumption become primary boundary values for optimal ROI Single task typically uses one thread at a time Typically investing in more cores does not see a linear return Typically investing in more performant CPU s does not see a linear return Typically, threads experience a large amount of idle time while waiting for I/O response

10 THE HADOOP PROBLEM - MEMORY Memory Performance: Memory capacity does not have a significant impact on processing times Heuristics on Memory: No negative impact for running more and or higher quality Memory Price becomes the primary boundary values for optimal ROI Typically, Memory capacity does not have a significant impact on processing times Additional Memory will support MapReduce in the sorting process

11 THE HADOOP PROBLEM - DRIVES Drive Density Popular Drive Sizes: 1, 2, 3, 4TB drives Heuristics on drives: Larger the drive, the cheaper the $/TB = optimal ROI Larger drives create an opportunity for replication storms Disk rebuild can take longer and has potential to saturate the network impacting cluster performance Typically, drive size and latency has little impact on cluster performance There are exceptions Typically a less optimal ROI is achieved by using faster drives MapReduce is designed for long sequential reads and writes Less value in addressing disk latency

12 THE HADOOP PROBLEM - NETWORK Network Performance: Typically, 1GbE is not enough bandwidth for production Hadoop clusters Network Heuristics: Networking is a critical area for Hadoop clusters Production clusters have 10GbE, sometimes 2GbE Compression can drastically improve network performance Bandwidth beyond 10GbE is rarely a necessity Note on Networking: Differences between bandwidth and latency Higher bandwidth can lead to higher volume at a given latency Lower latency fabrics can lead to higher volume and higher response (improved environment performance)

13 THE HADOOP PROBLEM - POWER Power Considerations: Availability versus cost is the primary consideration Value tapers with size of cluster, for instance: 10 node production cluster for a smaller organization Larger than 20 nodes, the value tapers off If using single power supply: Consider MTBF at node level and network impact for rebuild Exception - Master Nodes: Dual power supplies are recommended

14 HADOOP COST CONSIDERATIONS Price per Node Performance per Node Capacity per Node Space, Power, Cooling Supportability - FTE Resiliency: Availability Fragmentation Failure Impact (risk)

15 THE HADOOP SYSTEMS ARCHITECTURE PROBLEM Architecture 3x Full Copy Replication No Compression No Data De-Duplication Near linear scalability (95%) Performance Profile Primary Bottleneck I/O Secondary Bary Bottleneck internode traffic (100 s nodes) CPU/Memory under-utilized per chassis Configuration Backup Solution Prod Sized Cluster Fixed disk sizing at the chassis level

16 WHY ZFS? Performance Compression Block Size Analytics Backup/Recovery Cost

17 ZFS HYPOTHESIS ZFS advantages for Hadoop DRAM Faster processing Larger block size (128k-1MB) Faster processing Compression Reduced footprint Encryption (slipped to Fall 2014) Expected Outcome Equivalent/near-equivalent processing Economical backup solution Reduced disk footprint Right size disk allocation to server

18 WE GET A DISRUPTIVE WIN IF Drive Hadoop from being I/O bound to being CPU/Memory bound Significantly Reduce disk footprint Huge implications if we drive all load to CPU

19 ZFS TESTING SYS ARCH LOCAL CLUSTER Hadoop: Cloudera Name Node 5 Data Nodes Servers: (6) X4-2L s OL 6.3 (upgraded to OL 6.5) (2) Intel Xeon E v2 10-core 3.0 GHz proc s 128GB Memory (DDR3-1600) (12) 4TB 7200 rpm 3.5-inch SAS-2 HDD Local disk Storage: 240TB total local disk

20 ZFS TESTING SYS ARCH ARRAY CLUSTER Hadoop: Cloudera Name Node 5 Data Nodes Servers: (6) X4-2L s OL 6.3 (upgraded to OL 6.5) (2) Intel Xeon E v2 10-core 3.0 GHz proc s 128GB Memory (DDR3-1600) (12) 4TB 7200 rpm 3.5-inch SAS-2 HDD Local disk Storage: ZS3-4 (Clustered) 2TB DRAM 6 Shelves 900GB 10K RPM HDD 108 TB

21 ZFS STORAGE REFERENCE ARCHITECTURE

22 BENCHMARK APPROACH Cluster Type Local Cluster Array Cluster Terasort 10GB 100GB 1TB TestDFSIO 100GB 1TB 10TB

23 DATA TESTING APPROACH Cluster Type Local Cluster Array Cluster Types of Jobs 3 Types written in Hive Simple (4x) Medium Complexity (4x) High Complexity/Inefficient Process (4x) Job Size 400GB 800 GB 1.6 TB

24 DATA TESTING FINDINGS LOCAL CLUSTER 1.6TB Simple (s) Medium (s) Complex (s) ARRAY CLUSTER 1.6TB Simple (s) Medium (s) Complex (s) *128K block

25 HADOOP AND ZFS TEST RESULTS SUMMARY Hadoop Operations: Completion of jobs approx 280% faster Larger jobs trend in a near 1:1 linear fashion Compression Compression of x achieved on lowest setting

26 BENEFITS OF RUNNING ZFS ON HADOOP Reduced cluster overhead with replication factor of 2x Reduced storage with replication factor to 2x Increased protection: number of copies of data to 4x Added compression of > 3x (for compressible data) Added caching decreasing I/O response times Added data protection (RAID 1) no overhead Added fault tolerance via clustered heads

27 PROCESSING IMPLICATIONS TYPE STORAGE CAPACITY PROCESSING (SERVERS) 24 HOURS (PB) ANNUAL (PB) Server Array

28 IMPACT OF YARN AND SPARK Reduced Map/Reduce Ratio Management for mixed workloads Greater flexibility on coding choices Lower latency for request to completion = faster QOS by job/process opportunities Greater flexibility on archiving/storing data Possibility of using higher levels of compression for data segments Increased complexity of process/library management

29 EXABYTE PLATFORM CONSIDERATIONS Compression Access Tiered Data Encryption Capacity Network Speed Workload Segmentation Data Fragmentation Block rebuild/disk rebuild process

30 MINE IS BIG HOW BIG IS YOURS? Global Data Census: zettabytes 2020: 50+ zettabytes (est) Data Scale: KB: 1,000 B MB: 1,000,000 B GB: 1,000,000,000 B TB: 1,000,000,000,000 B PB: 1,000,000,000,000 B EB: 1,000,000,000,000,000 B ZB: 1,000,000,000,000,000,000B

31 MINE IS BIG HOW BIG IS YOURS? GraySort Benchmark: 2009: TB/Min Yahoo, 3452 Nodes (2x, 8GB, 4 SATA) 2011: TB/Min UC San Diego, 52 Nodes (2 CPU, 24GB, 16x 500GB) 2013: 1.42 TB/Min - Yahoo, 2100 Nodes (2CPU, 64GB, 12x 3TB) Yahoo: 2012: 42,000 nodes, 200PB, 20 Prod Clusters (largest is 4000 nodes) Facebook: 2010: 2000 nodes, 21PB Spotify: 2014: 694 heterogeneous nodes, 14.25PB (12k jobs/day)

32 HADOOP ON ZFS TECHNICAL WHITEPAPER Technical Whitepaper Published Follow for notification of link

33 Contact Information: Brett Weninger, Managing Director

Analytics in the cloud

Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA