Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Similar documents
How Flash-Based Storage Performs on Real Applications Session 102-C

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Solid State Storage is Everywhere Where Does it Work Best?

Solid State Performance Comparisons: SSD Cache Performance

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Storage Protocol Offload for Virtualized Environments Session 301-F

Analyst Perspective: Test Lab Report 16 Gb Fibre Channel Performance and Recommendations

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

Next Generation Storage Networking for Next Generation Data Centers. PRESENTATION TITLE GOES HERE Dennis Martin President, Demartek

Performance Benefits of NVMe over Fibre Channel A New, Parallel, Efficient Protocol

Evaluation of Chelsio Terminator 6 (T6) Unified Wire Adapter iscsi Offload

Accelerate Applications Using EqualLogic Arrays with directcache

Storage Best Practices for Microsoft Server Applications

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Evaluation of Multi-protocol Storage Latency in a Multi-switch Environment

Demartek Evaluation of Multiprotocol Storage Latency in a Multi-Switch Environment. Webinar August 9, 2012

Demartek Evaluation Accelerate Business Results with Seagate EXOS 15E900 and 10E2400 Hard Drives

Demartek Evaluation Accelerated Business Results with Seagate Enterprise Performance HDDs

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

EMC VFCache. Performance. Intelligence. Protection. #VFCache. Copyright 2012 EMC Corporation. All rights reserved.

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Session 201-B: Accelerating Enterprise Applications with Flash Memory

Virtualization of the MS Exchange Server Environment

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

Demartek September Intel 10GbE Adapter Performance Evaluation for FCoE and iscsi. Introduction. Evaluation Environment. Evaluation Summary

Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Interface Trends for the Enterprise I/O Highway

IBM and HP 6-Gbps SAS RAID Controller Performance

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

NetApp AFF A300 Review

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

A Deeper Dive: Where (And How and Why) to Implement Solid-State Storage

Flash Storage with 24G SAS Leads the Way in Crunching Big Data

Flash In the Data Center

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Demartek December 2007

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Evaluation Report: Dot Hill AssuredSAN 4824 in a Mixed Workload Virtualized Environment

I/O Virtualization The Next Virtualization Frontier

Evaluation Report: Supporting Multiple Workloads with the Lenovo S3200 Storage Array

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Benchmarking Enterprise SSDs

Performance Testing of SQL Server on Kaminario K2 Storage

Copyright 2012 EMC Corporation. All rights reserved.

Gen 6 Fibre Channel Evaluation of Products from Emulex and Brocade

Performance Baseline for Deploying Microsoft SQL Server 2012 OLTP Database Applications Using EqualLogic PS Series Hybrid Storage Arrays

SQL Server Then and Now: Changing the State of Long-held Beliefs. Maxwell Myrick Managing Partner SQLHA LLC

Removing the I/O Bottleneck in Enterprise Storage

Milestone Systems CERTIFICATION TEST REPORT Version /08/17

Performance Scaling. When deciding how to implement a virtualized environment. with Dell PowerEdge 2950 Servers and VMware Virtual Infrastructure 3

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Storage Systems Can Now Get ENERGY STAR Labels and Why You Should Care

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

Assessing performance in HP LeftHand SANs

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

SoftNAS Cloud Performance Evaluation on AWS

Comparing Performance of Solid State Devices and Mechanical Disks

What is QES 2.1? Agenda. Supported Model. Live demo

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

NetVault Backup Client and Server Sizing Guide 2.1

2009. October. Semiconductor Business SAMSUNG Electronics

NetVault Backup Client and Server Sizing Guide 3.0

Test Lab Presentation

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Performance Testing December 16, 2017

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Ceph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Cisco UCS C-Series I/O Characterization

PRESENTATION TITLE GOES HERE

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Getting it Right: Testing Storage Arrays The Way They ll be Used

High Performance SSD & Benefit for Server Application

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Oracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Build a Better Data Center with 24G SAS

Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison

Accelerating Workload Performance with Cisco 16Gb Fibre Channel Deployments

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

Extremely Fast Distributed Storage for Cloud Service Providers

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

IBM System Storage DS8870 Release R7.3 Performance Update

Lab Validation: Gridstore Storage 3.0 3

Optimizing Server Designs for Speed

The QLogic 8200 Series is the Adapter of Choice for Converged Data Centers

Comparison of Storage Protocol Performance ESX Server 3.5

Transcription:

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage TechTarget Dennis Martin 1

Agenda About Demartek Enterprise Data Center Environments Storage Performance Metrics Synthetic vs. Real-world Workloads Performance Results Various Flash Solutions Reference Resources 2

About Demartek Industry Analysis and ISO 17025 accredited test lab Lab includes enterprise servers, networking & storage (DAS, NAS, SAN, 10GbE, 40GbE, 16GFC) We prefer to run real-world applications to test servers and storage solutions (databases, Hadoop, etc.) Demartek is an EPA-recognized test lab for ENERGY STAR Data Center Storage testing Website: www.demartek.com/testlab 3

Enterprise Data Center Environments Typically support a large number of users and are responsible for many business applications Often have specialists for applications, operating environments, networking and storage systems Have a large amount of equipment including servers, networking and storage gear Multiple types and generations within each category Reliability, Availability and Serviceability (RAS) Complex systems working together 4

Demartek Tutorial Videos Short videos (3 4 minutes) Storage Basics http://www.demartek.com/demartek_tutorial_video.html 5

Interface vs. Storage Device Speeds Interface speeds are generally measured in bits per second, such as megabits per second (Mbps) or gigabits per second (Gbps). Lowercase b Applies to Ethernet, Fibre Channel, SAS, SATA, etc. Storage device and system speeds are generally measured in bytes per second, such as megabytes per second (MBps) or gigabytes per second (GBps). Uppercase B Applies to devices (SSDs, HDDs) and PCIe, NVMe 6

Storage Interface Comparison Reference Page Demartek Storage Interface Comparison page Search engine: Storage Interface Comparison Includes new interfaces such as 25GbE, 32GFC, Thunderbolt 3 http://www.demartek.com/demartek_interface_comparison.html 7

Storage Performance Metrics 8

Storage Performance Metrics IOPS & Throughput IOPS Number of Input/Output (I/O) requests per second Throughput Measure of bytes transferred per second (MBps or GBps) Sometimes also referred to as Bandwidth Read and Write metrics are often reported separately 9

Storage Performance Metrics Latency Latency Response time or round-trip time, generally measured in milliseconds (ms) or microseconds (µs) Sometimes measured as seconds per transfer Time is the numerator, therefore lower latency is faster Latency is becoming an increasingly important metric for many real-world applications Flash storage provides much lower latency than hard disk or tape technologies, frequently < 1 ms 10

I/O Request Characteristics 11

I/O Request Characteristics Block Size Block size is the size of each individual I/O request Minimum block size for flash devices is 4096 bytes (4KB) Minimum block size for HDDs is 512 bytes Newer HDDs have native 4KB sector size ( Advanced Format ) Maximum block size can be multiple megabytes Block sizes are frequently powers of 2 Common: 512B, 1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB 12

I/O Request Characteristics Queue Depth Queue Depth is the number of outstanding I/O requests awaiting completion Applications can issue multiple I/O requests at the same time to the same or different storage devices Queue Depths can get temporarily large if The storage device is overwhelmed with requests There is a bottleneck between the host CPU and the storage device 13

I/O Request Characteristics Access Patterns: Random vs. Sequential Access patterns refers to the pattern of specific locations or addresses (logical block addresses) on a storage device for which I/O requests are made Random addresses are in no apparent order (from the storage device viewpoint) Sequential addresses start at one location and access several immediately adjacent addresses in ascending order or sequence 14

I/O Request Characteristics Read/Write Mix The read/write mix refers to the percentage of I/O requests that are read vs. write Flash storage devices are relatively more sensitive to the read/write mix than HDDs due to the physics of NAND flash writes The read/write mix percentage varies over time and with different workloads 15

Synthetic vs. Real-world Workloads 16

Synthetic Workloads Purpose Synthetic workload generators allow precise control of I/O requests with respect to: Read/write mix, block size, random vs. sequential & queue depth These tools are used to generate the hero numbers 4KB 100% random read, 4KB 100% random write, etc. 256KB 100% sequential read, 256KB 100% sequential write, etc. Manufacturers advertise the hero numbers to show the topend performance in the corner cases Demartek also sometimes runs these tests 17

Synthetic Workloads Examples Several synthetic I/O workload tools: Diskspd, fio, Iometer, IOzone, SQLIO, Vdbench, others Some of these tools have compression, data de-duplication and other data pattern options Demartek has a reference page showing the data patterns written by some of these tools http://www.demartek.com/demartek_benchmark_output_file_formats.html 18

Real-world Workloads Use variable levels of compute, memory and I/O resources as the work progresses May use different and multiple I/O characteristics simultaneously for I/O requests (block sizes, queue depths, read/write mix and random/sequential mix) Many applications capture their own metrics such as database transactions per second, etc. Operating systems can track physical and logical I/O metrics End-user customers have these applications 19

Real-world Workload Types Transactional (mostly random) Generally smaller block sizes (4KB, 8KB, 16KB, etc.) Emphasis on the number of I/O s per second (IOPS) Streaming (mostly sequential) Generally larger block sizes (64KB, 256KB, 1MB, etc.) Emphasis on throughput (bandwidth) measured in Megabytes per second (MBps) Latency is affected differently by different workload types 20

Performance Results 21

Generic IOPS and Throughput Results These performance curves generally apply to network and storage performance 22

Generic Latency Results The nature of each workload has a large impact on latency. The red workload affects the blue workload (06:00 & 10:00) 23

Storage Performance Measurement Multiple Layers There are many places to measure storage performance, including software and hardware layers Multiple layers in the host server, storage device and in between The storage hardware is not the only source of latency Latency example in a SAN 24

Purpose of Tests (Need Answers) Flash allows more work to be completed in less time What is the effect on host CPU utilization (physical and virtual)? What is the effect of server memory on storage performance? How do caching and tiering compare? What are some real-world workload block sizes? Can flash storage handle multiple workloads? Different workload settings were used for each set of tests Most of the measurements were taken at the host server 25

General Notes on These Tests SQL Server, Oracle database best practices: Put database files and log files on different volumes Different I/O patterns for database files and log files SQL Server and Oracle database will take as much machine as you make available (cores, memory, etc.) Different results for 4-proc server with lots of memory vs. 1-proc server with small memory 26

Flash Products Tested Dot Hill hybrid storage array (caching and tiering with HDD & SSD, 16GFC SAN) PMC-Sierra NVMe PCIe card vs. eight SATA SSDs (server-side, small server) Samsung SM1715 NVMe PCIe cards (server-side, big server) Kaminario K2 all-flash array (two mixed workloads, 8GFC SAN) Violin Memory 7300 all-flash array (four mixed workloads, 16GFC SAN) 27

Dot Hill (Seagate) Hybrid Array Dot Hill AssuredSAN 4824 20x 900GB 10K RPM HDDs 4x 400GB SSDs Three configurations 1. HDD only: No SSDs used 2. Two SSDs used as a cache 3. Four SSDs used as a tier Microsoft SQL Server OLTP 28

IOPS & Throughput http://www.demartek.com/demartek_dot_hill_assuredsan_4824_improving_sql_server_performance_2015-08.html 29

Latency and Transactions per Second http://www.demartek.com/demartek_dot_hill_assuredsan_4824_improving_sql_server_performance_2015-08.html 30

NVMe SSD vs. SATA SSD (Inside Server) 1x PMC Flashtec NVMe2032 board 8x SanDisk Extreme Pro SSD (among the best SATA SSDs) Single processor, 8 GB RAM Microsoft SQL Server OLTP workload Three configurations: NVMe board configured into four logical volumes 8x SATA SSDs managed by Windows Storage Spaces, four volumes spread across all eight devices 4x SATA SSDs as four individual devices one volume per device 31

Workload Block Sizes http://www.demartek.com/demartek_pmc-sierra_flashtec_nvme2032_evaluation_2015-09.html 32

NVMe IOPS http://www.demartek.com/demartek_pmc-sierra_flashtec_nvme2032_evaluation_2015-09.html 33

NVMe Throughput http://www.demartek.com/demartek_pmc-sierra_flashtec_nvme2032_evaluation_2015-09.html 34

NVMe Latency http://www.demartek.com/demartek_pmc-sierra_flashtec_nvme2032_evaluation_2015-09.html 35

Multiple NVMe Cards in One Server Four Samsung SM1715 NVMe PCI cards In-box Windows NVMe drivers 4 LUNs, one on each NVMe card Dell PowerEdge R920 Server 4x Intel Xeon E7-4880 v2, 2.5 GHz, 60 cores, 120 threads 416 GB RAM SQL Server OLTP workload Three memory allocations to SQL Server: 1. Full system memory 2. 16 GB 3. 8 GB 36

CPU Utilization based on Memory Allocation Limiting RAM allocated to SQL Server affects CPU utilization. 37

Memory Usage Database applications specifically use RAM to avoid performing I/O. Database attempts to fill memory cache with as much data as possible. MemMax filled cache after ~ 9 minutes 38

Database Read Block Size Bigger RAM buffers means larger block sizes for I/O. 39

Database Read IOPS Large memory means fewer I/O operations (blue line). 40

Database Read Throughput Smaller memory makes the storage work harder. MemMax populating memory cache for the first 9 minutes. 41

Average Database Read Latency Read latencies approaching 100 µs for the Samsung SM1715 NVMe cards. 42

Average Log Write Latency Write latencies approximately 80 µs for the Samsung SM1715 NVMe cards. 43

Two SQL Server Workloads Two workloads from one VMware server simultaneously VM Win-1 (8 vcpu) SQL Server OLTP (TPC-E) VM Win-2 (8 vcpu) SQL Server Data Warehousing (TPC-H) Incremented workloads until host CPUs > 90% utilization (more storage capability than host capability) Kaminario K2 all-flash array 8GFC SAN configuration Four 1TB LUNs - Two for each VM (Database and Log) 44

Host Processor Utilization (VMware VMs) Both workloads max CPU utilization very heavy > 90% 45

Database Read Block Size OLTP reads (red line) are mostly 8K blocks while data warehousing reads are very large blocks (blue line). 46

Log Volume Block Sizes Mixture of 8K and 64K Reads Log writes are small, variable block sequentialish writes for OLTP workload but 64K for data warehousing workload 47

Database Read IOPS IOPS throttled by host CPUs (VMs) running at 90%+ utilization. 48

Database Read Throughput Database throughput throttled by host CPUs (VMs) running at 90%+ utilization. 49

Log Write Throughput Log write activity proportional to database activity. 50

Database Latency This data warehousing workload (blue line) is brutal on latency, even for all-flash arrays. We see similar latency spikes with this workload on all the all-flash arrays that we have tested. Not all of this latency is due to the storage array. 51

Location of Measurement Matter The location of the measurement matters. In this screen shot from the Kaminario system, the overall latency is 0.65ms but the latency measured at the host is higher. Remember the Latency Example in the SAN on an earlier slide. 52

Four Workload Tests Four different workloads on the same all-flash array Simultaneously 1. Web server workloads (4x) 2. SQL Server OLTP workloads (2x) 3. Exchange Server Jetstress 4. Microsoft SharePoint Workloads were added incrementally every 15 minutes 53

Four Workload Configuration Two physical servers running Microsoft Windows Hyper-V Both: 2x Intel Xeon E5-2690 v2, 3.0 GHz, 20 cores, 40 threads Both: 256 GB RAM VMs spread across both servers 16GFC Infrastructure Violin Memory 7300 All-flash Array 35 LUNs/Volumes configured for the four workloads http://www.demartek.com/demartek_violin_memory_7300_fsp_multiple_workloads_evaluation_2015-09.html 54

Four Workloads Block Sizes Block Sizes Web ~ 20K SQL ~ 8K Exchange ~ 32K SharePoint ~ small, 32K, 64K 55

IOPS Synthetic & Real-world As a sanity check, we first ran a synthetic benchmark on each of the workload LUNs to see what the storage system in our configuration could do. We achieved almost 1M IOPS. 56

IOPS Real-world The SQL Server workload was responsible for the majority of the IOPS. It also used the smallest block size, on average, of all the workloads. 57

Throughput The SQL Server workload was responsible for the majority of the throughput. 58

Latency Single workload latencies were below 1ms, but slightly higher when multiple workloads of different types were added. Latencies measured at the host server. 59

Conclusions Real-world workloads can be messy compared to synthetic workloads Variable I/O characteristics and multiple factors influencing performance It is not enough to say that an all-flash array can deliver submillisecond latencies for a single workload Look for more Demartek multiple workload test results 60

Demartek Free Resources Demartek SSD Zone www.demartek.com/ssd Demartek iscsi Zone www.demartek.com/iscsi Demartek FC Zone www.demartek.com/fc Demartek SSD Deployment Guide www.demartek.com/demartek_ssd_deployment_guide.html Performance reports, Deployment Guides and commentary available for free download. Demartek commentary: Horses, Buggies and SSDs www.demartek.com/demartek_horses_buggies_ssds_commentary.html Demartek Video Library - http://www.demartek.com/demartek_video_library.html 61

Thank You! Demartek public projects and materials are announced on a variety of social media outlets. Follow us on any of the above. Sign-up for the Demartek monthly newsletter, Demartek Lab Notes. www.demartek.com/newsletter 62