Deployment Planning and Optimization for Big Data & Cloud Storage Systems

Similar documents
Highly accurate simulations of big-data clusters for system planning and optimization

Next-Generation Cloud Platform

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Cloudian Sizing and Architecture Guidelines

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

Scalability Testing with Login VSI v16.2. White Paper Parallels Remote Application Server 2018

RIGHTNOW A C E

EsgynDB Enterprise 2.0 Platform Reference Architecture

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Estimate performance and capacity requirements for InfoPath Forms Services 2010

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Is Open Source good enough? A deep study of Swift and Ceph performance. 11/2013

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

IBM Education Assistance for z/os V2R2

Flash Storage Complementing a Data Lake for Real-Time Insight

Accelerate Applications Using EqualLogic Arrays with directcache

The Oracle Database Appliance I/O and Performance Architecture

Scaling Up Performance Benchmarking

Service Oriented Performance Analysis

Parallels Remote Application Server. Scalability Testing with Login VSI

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

VVD for Cloud Providers: Scale and Performance Guidelines. October 2018

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Certified Reference Design for VMware Cloud Providers

10 Million Smart Meter Data with Apache HBase

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Scalable Streaming Analytics

YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores

Cloud Performance Simulations

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

@joerg_schad Nightmares of a Container Orchestration System

Tuning Enterprise Information Catalog Performance

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track. FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X

The Lion of storage systems

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Enosis: Bridging the Semantic Gap between

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Virtual SQL Servers. Actual Performance. 2016

No Tradeoff Low Latency + High Efficiency

scc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Primavera Compression Server 5.0 Service Pack 1 Concept and Performance Results

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Microsoft SharePoint Server 2010

SSDs that Think. Noam Mizrahi Vice President, Technology and Architecture CTO Office, Marvell

Certification Document GRAFENTHAL GmbH R2208 S2 01/04/2016. GRAFENTHAL GmbH R2208 S2 Storage system

PRESERVE DATABASE PERFORMANCE WHEN RUNNING MIXED WORKLOADS

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Lenovo Database Configuration

Designing Next Generation FS for NVMe and NVMe-oF

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

Scalable Distributed Training with Parameter Hub: a whirlwind tour

IBM Db2 Analytics Accelerator Version 7.1

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Tuning Intelligent Data Lake Performance

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Storage Optimization with Oracle Database 11g

Ghislain Fourny. Big Data 5. Column stores

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

All-Flash Storage System

VOLTDB + HP VERTICA. page

Dispatcher. Phoenix. Dispatcher Phoenix Enterprise White Paper Version 0.2

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray

A Non-Relational Storage Analysis

Introduction to Database Services

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Systems Architecture II

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

PRESENTATION TITLE GOES HERE

Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

TPCX-BB (BigBench) Big Data Analytics Benchmark

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

Architecture of a Real-Time Operational DBMS

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

HCI: Hyper-Converged Infrastructure

Transcription:

Deployment Planning and Optimization for Big Data & Cloud Storage Systems Bianny Bian Intel Corporation

Outline System Planning Challenges Storage System Modeling w/ Intel CoFluent Studio Simulation Methodology Cloud: Swift Simulator Big Data: HDFS and HBase Simulator Use Cases

Why System Planning 1. Planning H/W Component Selection Cluster Setup 4. Operation 2. Deployment 3. Optimization System Deploy Software /JVM /OS Settings Biz Code Insufficient Performance? With Biz Expansion Larger Data Volume More Data Sources Quicker Data Injection Higher Concurrency More analytics System Needs Scale 3 Bad Planning = Long Provision Time or Waste Resources

System Planning Challenges Great Hardware Diversity Increasing Software Complexity Providers are commonly addressing these challenges How to minimize cost? How to get the most out of the hardware? How to predict system scalability? How to offer the best user experience? 4

Outline System Planning Challenges Storage System Modeling w/ Intel CoFluent Studio Simulation Methodology Cloud: Swift Simulator Big Data: HDFS and HBase Simulator Use Cases

Simulation with Intel CoFluent Studio Enables fast What if? analysis with a virtual system 6

Simulation Architecture What-If Analysis for S/W stack optimization Predict perf on varies node, network and disk configuration Explore against users number and cluster size Details @ ICPP-2014 paper Simulating Big Data Clusters for System Planning, Evaluation and Optimization 7

Cloud Storage: Swift Simulator Logic View of a Swift Cluster Frontend model in icof Client Proxy Node Storage Node Backup model in C++ Ring parsing, Node mapping, Co-routine scheduling 8

Big Data Storage: HDFS Simulator Write Client Data node (Replication 1) Data Node (Replication 2) Data Node (Replication 3) [1..Nb_nodes] [1..Nb_nodes] [1..Nb_nodes] Requests (read/write) Name Node Read Client Data Node [1..Nb_nodes] 9

Big Data Storage: HBase Simulator RPC queue Flush queue Compact/Split queue

Output: Workload and H/W metrics Throughput Latency Resource usage CPU Network Disk I/O.

Output: Software States & Metrics Rich metrics for detailed system behavior analysis over time. Charts from HBase Simulation 12

Simulation vs. Measurements - Overall Performance Charts from Swift Simulation SW: Proxy Worker # Changes Workload: Currency Changes Cluster: Storage Node # Changes HW: Network Changes Consistent High Accuracy with Cluster, H/W, S/W or Workload Changes 13

Simulation vs. Measurements - System S/W State Charts from HBase Simulation IOPS HFiles Compaction Queue Flush Queue Measurement Simulation 14 High Fidelity on System S/W states

Simulation Speed Real cluster environment: 4 node (SNB DP) cluster Simulation environment: UP Desktop Speed: Swift Model: 1x ~ 10x slowdown HDFS Model: >2x faster than real execution HBase Model: 10x slowdown ~ 10x faster than real execution Cluster Size Data Volume Concurre ncy Simulation Speed 4 nodes 400GB 200 29 min (vs. 71 min) HDFS 40 nodes 4TB 2K 4.6 hour 100 nodes 10TB 5K 6.5 hour 15

Outline System Planning Challenges Storage System Modeling w/ Intel CoFluent Studio Use Cases HDFS (IDF 13 presentation) HBase Swift

Use case: HDFS Simulation Planning a Video Streaming System (Details @ IDF 13) 1K concurrency, 2.5MB/s/viewer Baseline: 1GbE, 4 Drivers/Node, 30 Nodes Solution 2: 10GbE, 20 Nodes, 4 Drives Solution 3: 10GbE, 10 Nodes, 8 Drives 17 Planning Result: 30 Nodes => 20 Nodes => 10 Nodes

Use case: HBase Simulation Planning a Video Analytics System Estimate ratio of HD cameras : Xeon servers Determine the best HW/SW configurations for typical scenarios: Best Insert Performance Best Query Latency Balanced AVG: 1GB video/hour/camera Peak: ~ 4GB /hour/camera AVG: 200MB/h/c unstructured vector feature data + structured meta data + thumbnail pics 18

Simulation Results Estimate Ratio for typical scenarios RS number 200 Cameras 1K Cameras INSERT 3 11 106 QUERY 5 22 215 BALANCE 4 14 135 Recommend Configurations Flusher Handler Blocking StoreFiles Memstore Block Mul 10K Cameras Compact min Choice of Production Environment Multiple Medium Size Clusters instead of one Large Cluster Large Compact Thread Small Compact Thread INSERT 24 3000 2 3 1 1 QUERY 24 7 2 3 14 10 BALANCE 8 100 4 10 6 1 Explore different scenario to determine best HW and SW configuration 19

Simulation Results System Status Options IOPS Cameras Blocking Freq CL Buffer @ Blocking + HFiles/Region @ Peak + Compact Queue @ Peak Baseline 174 250 210 sec 2.3GB 30 160:0 INSERT 285 400 100 sec 460MB 137 580:300 QUERY 133 190 240 sec 3.3GB 17 85:0 BALANCE 220 300 230 sec 1.9GB 42 100:0 Rich Metrics to Know System Behavior before Provisioning

Use case: Swift Simulation Optimizing a Swift Cluster Goal: Double system uploading throughput Baseline: 1 proxy + 3 storage, 8 Disks / node, 1Gb network Bottleneck on proxy network Bound on Concurrency S/W CPU Disk Network 1.25X w/ 10Gb network 1.15X w/ +1 proxy node Sources of Bottleneck Identify and Fix Bottlenecks Very Quickly 21

Summary Big Data and Cloud Storage introduces new challenges in planning and optimizing Traditional analysis solutions not practical Intel CoFluent Studio enables fast system analysis and optimization before hardware/software provisioning Anticipate user experience Reduce operating costs with optimal use of hardware and software 22

For More Information cof-info@intel.com & cofluent.intel.com

Backup

Swift Simulation Input Workload Parameters Concurrency Request Size S/W Settings Role setting (proxy, storage) for each node Object ring Proxy/object worker number Object size H/W Settings: Cluster size System Components (CPU, Disks, Memory, Network) Network topology

HBase Simulation Input: Workload Consult Customers Table Design Client Setting Table Setting Record Setting Simulation Input Record Pattern

HBase Simulation Inputs: S/W and H/W S/W Settings JVM/OS HBase (>40) HDFS Cluster Settings Node # Topology Processor Network Storage Memory HBase parameters modeled HBASE_HEAPSIZE hbase.client.write.buffer hbase.hregion.majorcompaction hbase.hregion.majorcompaction.jitter hbase.hregion.max.filesize hbase.hregion.memstore.block.multiplier hbase.hregion.memstore.flush.size hbase.hregion.preclose.flush.size hbase.hstore.blockingstorefiles hbase.hstore.blockingwaittime hbase.hstore.compaction.kv.max hbase.hstore.compaction.max hbase.hstore.compaction.max.size hbase.hstore.compaction.min.size hbase.hstore.compaction.ratio hbase.hstore.compaction.ratio.offpeak hbase.hstore.compactionthreshold hbase.regionserver.compactionchecker.majorcompactpriority hbase.regionserver.flusher.count hbase.regionserver.global.memstore.lowerlimit hbase.regionserver.global.memstore.upperlimit HDFS Property dfs.block.size (MB) dfs.replication hbase.regionserver.handler.count hbase.regionserver.hlog.blocksize hbase.regionserver.hlog.enabled hbase.regionserver.logroll.multiplier hbase.regionserver.logroll.period hbase.regionserver.maxlogs hbase.regionserver.optionallogflushinterval hbase.regionserver.region.split.policy hbase.regionserver.regionsplitlimit hbase.regionserver.regionsplitlimit hbase.regionserver.thread.compaction.large hbase.regionserver.thread.compaction.small hbase.regionserver.thread.compaction.throttle hbase.regionserver.thread.split hbase.rowlock.wait.duration hbase.server.thread.wakefrequency hbase.server.thread.wakefrequency.multiplier hbase.store.delete.expired.storefile ipc.server.max.callqueue.length ipc.server.read.threadpool.size 27