Service Oriented Performance Analysis

Similar documents
Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

OpenFOAM Performance Testing and Profiling. October 2017

Next-Generation Cloud Platform

Energy Efficient Big Data Processing at the Software Level

A Preliminary Approach for Modeling Energy Efficiency for K-Means Clustering Applications in Data Centers

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

Flash Storage Complementing a Data Lake for Real-Time Insight

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

ARISTA: Improving Application Performance While Reducing Complexity

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

Lenovo RD240 storage system

SNAP Performance Benchmark and Profiling. April 2014

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Native-Task Performance Test Report

NAMD Performance Benchmark and Profiling. January 2015

A Simple Model for Estimating Power Consumption of a Multicore Server System

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

IBM Db2 Analytics Accelerator Version 7.1

CPMD Performance Benchmark and Profiling. February 2014

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

PowerServe HPC/AI Servers

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Apache Spark Graph Performance with Memory1. February Page 1 of 13

DCBench: a Data Center Benchmark Suite

CISC 7610 Lecture 2b The beginnings of NoSQL

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

NEC Express5800/R120e-1M System Configuration Guide

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Cloud Computing with FPGA-based NVMe SSDs

LS-DYNA Performance Benchmark and Profiling. October 2017

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Presented by Nanditha Thinderu

Evolving To The Big Data Warehouse

Decentralized Distributed Storage System for Big Data

TECHNOLOGIES CO., LTD.

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

HUAWEI Tecal X6000 High-Density Server

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

HP Z Turbo Drive G2 PCIe SSD

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

MILC Performance Benchmark and Profiling. April 2013

GROMACS Performance Benchmark and Profiling. August 2011

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Intel Xeon E v4, Optional Operating System, 8GB Memory, 2TB SAS H330 Hard Drive and a 3 Year Warranty

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Architecture of a Real-Time Operational DBMS

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Deployment Planning and Optimization for Big Data & Cloud Storage Systems

HIGH-DENSITY HIGH-CAPACITY HIGH-DENSITY HIGH-CAPACITY

NoSQL Performance Test

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

HUAWEI Tecal X8000 High-Density Rack Server

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

TELECOMMUNICATIONS TECHNOLOGY ASSOCIATION JET-SPEED HHS3124F / HHS2112F (10 NODES)

NAMD GPU Performance Benchmark. March 2011

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 4 th, 2014 NEC Corporation

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Highly accurate simulations of big-data clusters for system planning and optimization

Quantifying Trends in Server Power Usage

A Fast and High Throughput SQL Query System for Big Data

Workstation Rack-Mount 4 RU Workstation, 4 RU Rack-Mount, Hexa-Core 3.5 GHz CPU, 16 GB DDR4 RAM, 256 GB SSD

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

InfiniBand Networked Flash Storage

NEC Express5800/R120d-2M Configuration Guide

IBM s Data Warehouse Appliance Offerings

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

IBM Power AC922 Server

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

HP ProLiant DL380 Gen8 and HP PCle LE Workload Accelerator 28TB/45TB Data Warehouse Fast Track Reference Architecture

(Please refer "CPU Support List" for more information.)

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Supermicro All-Flash NVMe Solution for Ceph Storage Cluster

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

HP solutions for mission critical SQL Server Data Management environments

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Intel Server. Performance. Reliability. Security. INTEL SERVER SYSTEMS

Cloudera Enterprise 5 Reference Architecture

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

What is Parallel Computing?

RS U, 1-Socket Server, High Performance Storage Flexibility and Compute Power

Intel SR2612UR storage system

EsgynDB Enterprise 2.0 Platform Reference Architecture

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Intel Server. Performance. Reliability. Security. INTEL SERVER SYSTEMS FEATURE-RICH INTEL SERVER SYSTEMS FOR PERFORMANCE, RELIABILITY AND SECURITY

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Transcription:

Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com

Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to end) big data performance analysis has become an extreme attention of ICT industry, the related techniques are being investigated by numerous hardware and software vendors. 2. Compute and storage devices, as one of the core components of a data center system, need specially designed approaches to measure, evaluate and analyze their performance. 3. This talk introduces our methods to create the performance model based on workload characterization, algorithm level behavior tracing and capture, and software platform management. 4. The functionality and capability of our methodology have been validated through benchmarks and measurements performed on real data center system.

A FusionInsight System for Big Data Page 2

Performance Modeling and Analysis Software Stack Performance Analysis I/O Model for the Inter-node connections Network Model Cluster Topology Linux Operating Systems Distribute Load balancing Computational Capabilities of Processing Element Algorithm Data Behavior Map/Reduce scheduling Memory Operations Data Movements Storage Usage

Program Behavior Model 1. Correlates data from all the experiments based on the common sampling rate and common time line; 2. Analyzing performance parameters and the application s ( Data Sorting ) behavior 3. Analyzing performance parameters and the application s ( Transactions ) behavior 4. Analyzing performance parameters and the application s ( K-Means Clustering) behavior 5. Identified program characters and create leading markers; 6. Identified program segments and perform detailed analysis on each segment; 7. Developed a Model that can use data captured from the systems stimuli and explore bottlenecks and dependencies. Page 4

Performance Tuning Application platform design pattern, problem size, sub-domain partition, iterations. Hadoop configurations and parameters, scalability, parallel efficiency Applications Hadoop family Software Stack OS strategies, DFS solutions Computing Pattern Delivery Operating System Machine learning & Computing Pattern Optimization Strategy The Performance and Power optimization clock Hardware Organization and Architecture Memory, Disk, Network, I/O Power/Energy Measurement and analysis Information Collection IPMI/BMC/PAPI Server IPMI/BMC/PAPI Server IPMI/BMC/PAPI Server Data Center Network Big Data Pattern and optimization reference Energy Optimization IPMI/BMC/PAPI Server Page 5

Power/Energy Tunings (Information collector) Power/Energy Measurement and Analysis Measure and Collecting energy information from system components Predicted Classification Machine Learning Algorithms Classifiers Decision trees And related liboraries Big Data Pattern and optimization reference Classification and the corresponding computing patterns Meta-data management Data follow analysis Training Examples New Examples Power usage control mechanisms DVS, DFS Power on/of switch Power usage pattern record for each component, including computing devices, storage devices and networking devices. Hardware Organization information from OS data Operating System Machine learning & Computing Pattern Power tuning approaches Power tuning machnisms Information of Memory, Disk, Network, I/O Physical system

Major Scenarios and the WL Characterization Application Scenarios Business Intelligent Workloads Algorithms Characterization Computing methods Benchmarks TeraSort PageRank K-means clustering Naive Bayes/ Aprioi Algorithms Sorting Algorithms Search Engines / Ranking pages Decision Supports Queries Association Rules VMall Joint Query List, Tables, Hash tables, Search Tech. Cloud+ Read/Write /Scan Multiple Data Operations I/O intensive Parallel Computing Multiple resource Utilization Massive data Parallel sorting Matrix Computing Massive Matrix Computing Hadoop Sort Hadoop/Spark + Computing intensive Parallel computing Hadoop/Spark Computing Intensive I/O Cloud Different DBs List, Tables, Hash tables, Search Tech. Parallel Database Access / Virtual machines / Queries Hive HBase CBTOOL, (SPEC Cloud) 7

Hadoop Cluster HW Configuration 16x 40GBe RH2288 X86 Huawei servers Huawei CE7850-32Q-EI 40Gbe Switch Servers: 16 x Huawei Tecal RH2288 v2 Server Total Processors/Cores/Threads 32/256/512 Processor 2 x Intel Xeon Processor E5-2680 v2, 2.70 GHz, 20M L3 Memory 256GB Storage Controller 1 x Symbios Logic MegaRAID SAS 2208 Storage Device 1 x 600GB 10K SAS HDD, 1 x 2.4TB Huawei ES3000 PCIe SSD Card Network 1 x Mellanox ConnectX-3 Pro EN 40GbE SFP+CNA Connectivity: 1 x Huawei CE7850-32Q-EI 40GigE Page 8

Memory Capacity Scalability 64GBDRAM 128GB DRAM P1 map 256GBDRAM P2 Shuffle 512GBDRAM P3 Reduce 768GBDRAM

TeraSort CPU Utilization (24CPUs) Page 10

Algorithm Behaviors in K-Means Clustering Page 11

Algorithm Behaviors in K-Means Clustering Page 12

Characters of K-means Clustering Page 13 CPU usage and the corresponding power measurement results, when executing K- means clustering operations following the system configuration. HDD and Main board (include memory) power measurement results, when executing K-means clustering operations following the system configuration.

Characters of K-means Clustering The Physical memory usage, total disk read and write measurement results, when executing K-means clustering operations following the system configuration. K-Means Job Parameter Results Results (Clustering Phase) (Data Writing Phase) CPU Power (average) 122.21W 98.67W Main board Power 72.09W 72.14W (incl. memory) HDD Energy 7.03W 7.17W Results Data Size / Watt 126.04M/Watt 158.13M/Watt The average of power measurement results for k-means clustering at clustering operation phase and data writing phase. The throughput and power performance of the target platform is concluded.

Big Bench Performance Chart Most OEMs do provide the capability to implement power measurement. In the latest generation IPMI can measure Power for the Entire system power, CPU subdomain power Memory Subdomain power. Page 15

Performance Analysis on Cloud SPEC Cloud 2016 CloudBecch (CBTOOL) o CloudBench kit, CloudOS, Virtual machine perofmrnace and reports. o 4 date bases performance: Cassandra; Hbase; PNUTS; MySQL o Uniform distribution; zipfian distribution ; popularity distribution; Latest distribution; multinomial distribution. Also Benchmarked o SPECvirt o Refer: Cloudsuite, CBTooL

From Application to Performance (Information collector) Power/Energy Measurement and Analysis Measure and Collecting energy information from system components Meta-data management Data follow analysis Power usage pattern record for each component, including computing devices, storage devices and networking devices. Hardware Organization information from OS data Information of Memory, Disk, Network,

Energy Tuning Controlled by OS Through IPMI or other mechanisms Controlled by Fan in the server, OR the cooling system in Data center Controlled by higher level software Components Frequencies Voltages Temperature Usage Total CPUs (processors, cores) Memories (RAM, NVM) GPUs (cores) Board (buses, slots) Buses (PCI, Optical) Disk (HDD, SSD) I/O (Network, DISK I/O) f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T f (Frequency steps) V (VIDs) T where A is the active fraction of the gates in the CMOS element that are switching each clock cycle, C is the total capacitance driven by all the gate outputs in the CMOS element (thereby making AC the capacitance being switched per clock cycles), V is the operating voltage of the CMOS element, and f is the frequency of the clock.

Thank you!