Dell EMC Unity: Performance Analysis Deep Dive. Keith Snell Performance Engineering Midrange & Entry Solutions Group

Similar documents
EMC Unity Family. Monitoring System Performance. Version 4.2 H14978 REV 03

DELL EMC UNITY: BEST PRACTICES GUIDE

Warsaw. 11 th September 2018

DELL EMC UNITY: VIRTUALIZATION INTEGRATION

CLOUDIQ OVERVIEW. The Quick and Smart Method for Monitoring Unity Systems ABSTRACT

MIGRATING TO DELL EMC UNITY WITH SAN COPY

DATA PROTECTION IN A ROBO ENVIRONMENT

DELL EMC UNITY: DATA REDUCTION

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Accelerate Applications Using EqualLogic Arrays with directcache

VMWARE VREALIZE OPERATIONS MANAGEMENT PACK FOR. Dell EMC VMAX. User Guide

Surveillance Dell EMC Storage with Milestone XProtect Corporate

Dell EMC Unity: Data Reduction Analysis

CLOUDIQ: INTELLIGENT, PROACTIVE MONITORING AND ANALYTICS

SC Series: Performance Best Practices. Brad Spratt Performance Engineering Midrange & Entry Solutions

Video Surveillance EMC Storage with Godrej IQ Vision Ultimate

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Release Notes P/N REV A01 August 11, 2010

Dell EMC Unity: Architectural Overview. Ji Hong Product Technologist Midrange & Entry Solutions Group

EMC VNX2 Deduplication and Compression

DELL TM AX4-5 Application Performance

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Real world observations from a Midsize Enterprise company transitioning to a Virtualized Environment

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

EMC Disk Tiering Technology Review

UNITY FAMILY Flexible Deployment Options

TS7700 Technical Update TS7720 Tape Attach Deep Dive

Optimizing Tiered Storage Workloads with Precise for Storage Tiering

VMAX: PERFORMANCE MADE SIMPLE

Technical Note P/N REV A01 March 29, 2007

Copyright 2012 EMC Corporation. All rights reserved.

Surveillance Dell EMC Storage with Verint Nextiva

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

Storage Tiering for the Mainframe

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

DELL EMC UNITY: UNISPHERE OVERVIEW

PowerVault MD3 SSD Cache Overview

EMC Unisphere for VMAX Database Storage Analyzer

Dell EMC SAN Storage with Video Management Systems

EMC VSPEX FOR VIRTUALIZED MICROSOFT SQL SERVER 2012 WITH MICROSOFT HYPER-V

Performance Testing December 16, 2017

Load Dynamix Enterprise 5.2

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

EMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

The Total Network Volume chart shows the total traffic volume for the group of elements in the report.

Dell EMC Unity Family

HYBRID STORAGE TM. WITH FASTier ACCELERATION TECHNOLOGY

Dell EMC Unity Family

Dell EMC Service Levels for PowerMaxOS

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

PESIT Bangalore South Campus

CSE 124: Networked Services Lecture-16

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

EMC VSPEX FOR VIRTUALIZED MICROSOFT EXCHANGE 2013 WITH MICROSOFT HYPER-V

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Dell EMC CIFS-ECS Tool

Using EMC FAST with SAP on EMC Unified Storage

CSE 124: Networked Services Fall 2009 Lecture-19

Exporting the DS8000 Performance Summary

Dell EMC All-Flash solutions are powered by Intel Xeon processors. Learn more at DellEMC.com/All-Flash

EMC Innovations in High-end storages

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

StorMagic SvSAN 6.1. Product Announcement Webinar and Live Demonstration. Mark Christie Senior Systems Engineer

Key metrics for effective storage performance and capacity reporting

EMC VFCache. Performance. Intelligence. Protection. #VFCache. Copyright 2012 EMC Corporation. All rights reserved.

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

TS7700 Technical Update What s that I hear about R3.2?

AN ALTERNATIVE TO ALL- FLASH ARRAYS: PREDICTIVE STORAGE CACHING

DELL EMC UNITY: UNISPHERE OVERVIEW

Surveillance Dell EMC Storage with Bosch Video Recording Manager

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Nimble Storage Adaptive Flash

Infrastructure Tuning

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

DELL EMC UNITY: REPLICATION TECHNOLOGIES

EMC XTREMCACHE ACCELERATES ORACLE

FAST & Furious mit EMC² VNX

The Google File System

Surveillance Dell EMC Storage with FLIR Latitude

Dell EMC Unity: Built-In Hybrid Cloud & Software Defined Capabilities. Wei Chen Product Technologist Midrange & Entry Systems

SoftNAS Cloud Performance Evaluation on AWS

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

VMware vsphere Clusters in Security Zones

1Z0-433

EonStor DS 3000 Series

DNS Server Status Dashboard

Copyright 2012 EMC Corporation. All rights reserved.

White paper ETERNUS Extreme Cache Performance and Use

Flashed-Optimized VPSA. Always Aligned with your Changing World

IBM EXAM QUESTIONS & ANSWERS

A New Metric for Analyzing Storage System Performance Under Varied Workloads

1 Installing the ZENworks Content Reporting Package

Server Status Dashboard

Dell EMC Unity Family

EMC Business Continuity for Microsoft Applications

EMC VSPEX FOR VIRTUALIZED MICROSOFT EXCHANGE 2013 WITH HYPER-V

Transcription:

Dell EMC Unity: Performance Analysis Deep Dive Keith Snell Performance Engineering Midrange & Entry Solutions Group

Agenda Introduction Sample Period Unisphere Performance Dashboard Unisphere uemcli command line Performance Archives Summary 2

Introduction

Introduction Three uses for performance data: 1. Health Check Performance metrics provide the ability to determine operating efficiency of the system in servicing user requests Independent to block or file activity, the storage processors and disks are common contributors to performance that give us a first look at system health 2. Capacity Planning Checking current resource utilization Can we incrementally add workload to existing resources? Can we add hardware and workload to the system? 3. Troubleshooting Object specific performance metrics provide the capability to isolate and identify areas of concern 4

Performance Data Sample Period

Sample Period Performance data can be presented with different sample periods so what? The larger the sample period the more averaged the data is Reduces chance to view bursty activity Duration of bursts will dictate accuracy of displayed data Performance dashboard might look different depending on time period viewed Dashboard is minimum 60 second samples but can go up to 4 hours per sample Variation in performance will be averaged to the sample frequency being displayed For the most accurate and customisable performance analysis, post processing performance archives is recommended Custom options later in the presentation 6

Sample Period x y Here we have a timeline, and our sampling period is a Our characterization would denote that we have a peak of y, for the duration of the sample period z a a a a a a t If our sampling period was now lower, as shown here by b, our reported peak wouldn t be x, but x+y g t 6 * a Be aware of what data you are looking at 7

Performance Data where do I find it?

Performance Metrics And Where To Find Them Object Unisphere Dashboard/uemcli Archive Storage Processor Utilization (average) Utilization (average and per core) LUN Disk Response time, IOPS, MB/s, queue IOPS, MB/s, Service Time, queue Utilization, response time, IOPS, MB/s, queue Utilization, IOPS, MB/s, queue Ports IOPS, requests, MB/s IOPS, requests, MBPS File Systems IOPS, MB/s, IO size IOPS, MB/s, IO size FAST Cache Dirty ratio None (future) Utilization, response time and MB/s are key quality of service indicators Utilization at LUN and disk layer is available from archive data 9

Performance Data Performance Dashboard And Historical Database

Scenario-1 (Performance Dashboard) Dell EMC Unity 400 Hybrid Array Data Hybrid pool 18 * 800GB SAS FLASH 3 20 * 1.2TB SAS 8 LUNs set to highest available to pin to FLASH tier 8 LUNs set to lowest available to pin to SAS tier Metadata for all LUNs would be resident in the highest tier available Variable workload duration of 1 hour Read to write ratio mostly 80:20 I/O size mixture of 4KB, 8KB, 16KB, 32KB and 64KB Analysis Method Unisphere Performance Dashboard 11

Performance Dashboard 12

Unisphere Performance Dashboard The performance dashboard is primarily used for viewing performance data from the historical database, and can be used to determine the health of the system Time selection Time available Sample period available 60 seconds = up to 3 days of data 300 seconds = up to 14 days of data 3600 seconds = up to 28 days of data 14400 seconds = up to 90 days of data 13

Storage Processor Utilization Observations: 1) SP-A workload is saturating utilization and causing imbalance 2) Lower periods of activity, utilization is well within good range, and reasonably balanced Questions: 1) How is this saturation affecting workloads? 2) What activity is contributing to this saturation? 3) What are our options to reduce the effect of this workload? 14

System Level Statistics 15

Port IOPS And MB/s Observations: 1) I/O is distributed across available Fibre Channel ports 2) 16Gb FC ports capable of >40K IOPS and ~1500MB/s bandwidth 3) Additional protocol statistics are available: 16

FLASH LUN Statistics 17

SAS LUN Statistics 18

LUN I/O Size And MB/s 19

Scenario-1 Review Summary SP Utilization becomes imbalanced and highly utilized at different periods Response times are within acceptable limits when we consider the utilization of the system and the queue to the active LUNs The high utilization of the SP is likely going to lead to issues if load increases, or we decide to utilize options like snapshots, compression, replication Options Isolate workloads and consider migration to another system Utilize Host I/O Limits on the system to limit the performance capability of targeted objects (LUNs) 20

Host I/O Limits 21

SP Utilization [Before] Observations with limits active: 1) Utilization maintained within good range 2) Host I/O limits applied to targeted objects only 3) Host I/O Limits can be dynamically adjusted [After] 22

LUN IOPS And Response Time 23

Scenario-1 Summary Storage processor utilization peaks were identified and correlated using the performance dashboard to specific workloads Other metrics were checked to verify no other issues observed Host I/O limits deployed to limit targeted LUN activity to reduce impact and maintain required levels of utilization 24

Performance Data someone is reporting a problem

Scenario-2 (uemcli) Dell EMC Unity 400 Hybrid Array Hybrid pool 18 * 800GB SAS FLASH 3 20 * 1.2TB SAS 8 LUNs set to highest available to pin to FLASH tier 8 LUNs set to lowest available to pin to SAS tier Workload Metadata for all LUNs would be resident in the highest tier available Varied workload with some scaling Variable I/O sizes Analysis Method uemcli historical stats Focus on SP Utilization, SAS LUN and disk IOPS and SAS LUN Response Time 26

Uemcli Options For Historical Data Available metrics for historical viewing uemcli -d <IP> -u <user> -p <pwd> /metrics/metric -availability historical show Lists all available metrics, ~77 in total Sample period available 60 seconds = up to 3 days of data 300 seconds = up to 14 days of data 3600 seconds = up to 28 days of data 14400 seconds = up to 90 days of data uemcli -d <IP> -u <user> -p <pwd> /metrics/value/hist -path sp.*.storage.lun.*.totalcallsrate show -from "2017-05-10 14:25:00" -count 360 -interval 60 -output csv uemcli -d <IP> -u <user> -p <pwd> /metrics/value/hist -path sp.*.storage.lun.*.responsetime show -from "2017-05-10 14:25:00" -count 360 - interval 60 -output csv 27

Uemcli Options For Real Time Available metrics for real time viewing uemcli -d <IP> -u <user> -p <pwd> /metrics/metric -availability real-time show Lists all available metrics, ~580 in total Uemcli syntax for real time commands: /metrics/value/rt -path <value> show -interval <value> [ { -period <value> -to <value> -count <value> } [ -summary ] ] [ -flat ] [ -output { nvp csv table [ -wrap ] } ] [ { -brief -detail } ] uemcli -d <IP> -u <user> -p <pwd> /metrics/value/rt -path sp.*.storage.lun.*.readsrate,sp.*.storage.lun.*.writesrate show -interval 30 We pick a longer interval than the minimum 5 as it can be challenging to compute/display multiple LUNs data in real time 28

SAS LUN IOPS And Response Time [SAS LUN IOPS] Observations: 1) Scaling the workload hits a plateau [SAS LUN Response Time] 2) Response time appears to be impacted when we are doing an aggregate of around 4000 IOPS Consider disk IOPS and at 80:20 workload to RAID 5, that equates to: (4000 / 5) * 8 = 6400 spread across 20 SAS = 320 per disk 29

SAS Disk IOPS [SAS Disk IOPS] Observation: Scaling workload pushes IOPS above recommended levels when referencing the Best Practices Guide CPU Util Consider dynamic pool expansion to distribute the load 30

Pool Expansion 31

After Expansion: SAS LUN IOPS [SAS LUN IOPS - Before] Observations: 1) With 20 SAS in the pool, the workload appeared to hit a plateau [After] 2) With 40 SAS in the pool, we now achieve ~50% more IOPS 3) Lower contention, utilization and hopefully response time 32

After Expansion: SAS LUN Response Time [SAS LUN Response Time - Before] Observations: 1) With 20 SAS in the pool, response time soon exceeded 20ms rising to ~50ms [After] 2) With 40 SAS in the pool, response time is dramatically improved 3) Queue distribution results in lower contention, utilization and response time 33

After Expansion: SAS Disk IOPS [SAS Disk IOPS - Before] Observations: 1) With 20 SAS in the pool, disks were saturated [After] 2) With 40 SAS in the pool, I/O distribution is much better resulting in lower utilization of the disks, contributing to reduction in response time for host I/O 34

Scenario-2 Summary Uemcli statistics match the capability of the performance dashboard Allows collection and post-processing of performance data in a customized way Using this method we identified resource utilization issues, specifically 10K rpm SAS disks Pool expansion was utilized to resulting in optimized handling of the workload leading to lower disk utilization and an improvement in IOPS and response time What about performance archives? 35

Performance Archives What and How?

Performance Archives Archives contain 1 hour of data in a SQL database format Each archive is aligned to the top of the hour e.g. coverage of 3pm to 4pm, and 4pm to 5pm Filename is date and time referenced to the start time of the archive (UTC time) Partial archives are readable self contained sql database files Repository contains a minimum of 48 archives (covering 2 days of high definition performance data) As of Dell EMC Unity OE 4.2, archives can be retrieved in the UI Retrieving archives is currently possible via WinSCP You can look at the structure of the archive with DB Browser for SQL lite https://www.sqlite.org/download.html Export requires data manipulation to evaluate timestamp details from an offset of epoch time Also per second samples for metrics like I/O, MB, calls, etc Object names have to be mapped to user objects where possible with embedded tables 37

Dell EMC Unity Performance Archive Dump Options: 1 to multiple archives Output to csv format 2 variants of formatting Timestamps Equated per second metrics Ongoing development Early access availability via contact upad@dell.com Sample output: cpu_core 20170320_110000.csv fibrechannel_feport 20170320_110000.csv iscsi_feport 20170320_110000.csv net_device 20170320_110000.csv physical_disk 20170320_110000.csv storage_filesystem 20170320_110000.csv storage_flu 20170320_110000.csv storage_lun 20170320_110000.csv 38 Aligned with Unisphere archive retrieve capability storage_pool 20170320_110000.csv What do I do with dumped csv data?

Excel If your timestamp doesn t show seconds, you can select column A and change format Add :ss to show seconds for each sample After selecting the entire sheet by clicking in the top corner, select insert pivot chart, that will default to the whole table 39

Pivot Chart The Easy Guide To Charting Data In pivot fields, drag timestamp to Axis category, user_lun to Legend, and the metric required to plot into Values Ideally, to verify single object selection, check value count and it should be 1 Here we see something with a count of 4: these are 4 system related LUNs that have no user ID s, so click in the chart user_lun drop down and deselect the 1st entry Now we only see 1 entry per sample, we can change the value field to show the data As there is only 1 sample per user LUN at each time point, we can select min, max, sum, as either will show the 1 value present 40

Pivot Chart The chart type will default to bar, though most times it s better to change type to line Now you can easily filter using the drop downs for specific LUNs or time periods You can also change chart type to stack to show aggregate values when all LUNs selected (this is also very useful) 41

Pivot Disk IOPS Disk IOPS Stacked chart: Disk Total IOPS ~185K Using stacked charts, we can determine a disk summary here of ~185K IOPS Disk stats represent block LUN and file system activity, internal operations and snaps 42

Summary Multiple performance data options for viewing, collection and analysis Dell EMC Unity Best practices for performance referenced for health status Sample period considerations with different methods to look at data Issue isolation and possible solutions considered, engaging Host I/O Limits, and rebalancing of load using dynamic pool expansion 43