Decoupling Datacenter Studies from Access to Large-Scale Applications: A Modeling Approach for Storage Workloads

Similar documents
Decoupling Datacenter Studies from Access to Large-Scale Applications: A Modeling Approach for Storage Workloads

Addressing the Stranded Power Problem in Datacenters using Storage Workload Characterization. January 30 th, 2010 Sriram Sankar and Kushagra Vaid

Getting it Right: Testing Storage Arrays The Way They ll be Used

ibench: Quantifying Interference in Datacenter Applications

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

AN ALTERNATIVE TO ALL- FLASH ARRAYS: PREDICTIVE STORAGE CACHING

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Performance Modeling and Analysis of Flash based Storage Devices

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

OASIS: Self-tuning Storage for Applications

Characterizing Data-Intensive Workloads on Modern Disk Arrays

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

YouChoose: A Performance Interface Enabling Convenient and Efficient QoS Support for Consolidated Storage Systems

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

Load Dynamix Enterprise 5.2

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Accelerate Applications Using EqualLogic Arrays with directcache

PRESENTATION TITLE GOES HERE

On the Accuracy and Scalability of Intensive I/O Workload Replay

How Flash-Based Storage Performs on Real Applications Session 102-C

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

I/O Characterization of Commercial Workloads

Performance Analysis in the Real World of Online Services

Towards Model-based Management of Database Fragmentation

It s. slow! SQL Saturday. Copyright Heraflux Technologies. Do not redistribute or copy as your own. 1. Database. Firewall Load Balancer.

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

SMB 3.0 Performance Dan Lovinger Principal Architect Microsoft

Synthetic Enterprise Application Workloads. PRESENTATION TITLE GOES HERE Eden Kim, CEO Calypso Systems, Inc.

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

davidklee.net heraflux.com linkedin.com/in/davidaklee

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Evaluation of Chelsio Terminator 6 (T6) Unified Wire Adapter iscsi Offload

Hybrid Storage Performance Characteristics

SoftNAS Cloud Performance Evaluation on AWS

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Considerations to Accurately Measure Solid State Storage Systems

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT

Comparing Performance of Solid State Devices and Mechanical Disks

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

IBM DS8870 Release 7.0 Performance Update

Extreme Storage Performance with exflash DIMM and AMPS

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

PowerVault MD3 SSD Cache Overview

WHITE PAPER. Optimizing Virtual Platform Disk Performance

Flash vs. Disk Storage: Testing Workloads is Key

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Shen, Tang, Yang, and Chu

Efficient QoS for Multi-Tiered Storage Systems

Introducing and Validating SNIA SSS Performance Test Suite Esther Spanjer SMART Modular

Final Lecture. A few minutes to wrap up and add some perspective

TECHNICAL GUIDE. DataStream. Benchmarking Guide

QoS-Aware Admission Control in Heterogeneous Datacenters

SOFT 437. Software Performance Analysis. Ch 7&8:Software Measurement and Instrumentation

PARDA: Proportional Allocation of Resources for Distributed Storage Access

Performance Baseline for Deploying Microsoft SQL Server 2012 OLTP Database Applications Using EqualLogic PS Series Hybrid Storage Arrays

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen

How to create a synthetic workload test. Eden Kim, CEO Calypso Systems, Inc.

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

Virtual SQL Servers. Actual Performance. 2016

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

EsgynDB Enterprise 2.0 Platform Reference Architecture

Best Practices for SSD Performance Measurement

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Toward timely, predictable and cost-effective data analytics. Renata Borovica-Gajić DIAS, EPFL

Method-Level Phase Behavior in Java Workloads

SolidFire and Pure Storage Architectural Comparison

Emerging Performance Tests for NAND Flash Based Solid State Storage

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

VSSIM: Virtual Machine based SSD Simulator

Demartek Evaluation Accelerated Business Results with Seagate Enterprise Performance HDDs

Benchmarking Enterprise SSDs

PRESERVE DATABASE PERFORMANCE WHEN RUNNING MIXED WORKLOADS

Facing an SSS Decision? SNIA Efforts to Evaluate SSS Performance. Ray Lucchesi Silverton Consulting, Inc.

PCAP: Performance-Aware Power Capping for the Disk Drive in the Cloud

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Deployment Planning and Optimization for Big Data & Cloud Storage Systems

Performance Testing December 16, 2017

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

A Capacity Planning Methodology for Distributed E-Commerce Applications

Infrastructure Tuning

Demartek Evaluation Accelerate Business Results with Seagate EXOS 15E900 and 10E2400 Hard Drives

Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann

Performance Implications of Storage I/O Control Enabled NFS Datastores in VMware vsphere 5.0

The Need for Performance

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Linux Storage System Bottleneck Exploration

Assessing performance in HP LeftHand SANs

Khaled Amer Chair, SNIA SSS Performance Subcommittee. Santa Clara, CA USA August

Product/Family, Best Foot Forward (BFF), Test Points and Qualification Ranges

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Transcription:

Decoupling Datacenter Studies from Access to Large-Scale Applications: A Modeling Approach for Storage Workloads Christina Delimitrou 1, Sriram Sankar 2, Kushagra Vaid 2, Christos Kozyrakis 1 1 Stanford University, 2 Microsoft IISWC November 7 th 2011

Datacenter Workload Studies Open-source approximation of real applications User Behavior Model App App App Actual apps Real HW Statistical models of real applications App App App Collect traces, make model Model Model Model Real apps on real data center Run on similar HW + + Collect measurements Pros: Resembles actual applications Pros: Can modify the underlying hardware + Collect measurements Pros: Trained on real apps more representative Cons: Not exact match to real DC applications Cons: Hardware and Code dependent Cons: Many parameters/dependencies to model

Datacenter Workload Studies Open-source approximation of real applications User Behavior Model App App App Actual apps DC HW Use statistical models of real applications App App App Collect traces, make model Model Model Model Real apps on real data center Run on similar HW + + Collect measurements Pros: Resembles actual applications Pros: Can modify the underlying hardware + Collect measurements Pros: Trained on real apps more representative Cons: Not exact match to real DC applications Cons: Hardware and Code dependent Cons: Many parameters/dependencies to model

Outline Introduction Modeling + Generation Framework Validation Decoupling Storage and App Semantics Use Cases SSD Caching Defragmentation Benefits Future Work 4

Executive Summary Goal Statistical model for backend tier of DC apps + accurate generation tool Motivation Replaying applications in many storage configurations is impractical DC applications not publicly available Storage system: 20-30% of DC Power/TCO Prior Work Does not capture key workload features (e.g., spatial/temporal locality) 5

Executive Summary Methodology Trace ten real large-scale Microsoft applications Train a statistical model Develop tool that generates I/O requests based on the model Validate framework (model and tool) Use framework for performance/efficiency storage studies Results Less than 5% deviation between original synthetic workload Detailed application characterization Decoupled storage activity from app semantics Accurate predictions of storage studies performance benefit 6

Model Probabilistic State Diagrams: 4K rd Rnd 3.15ms 11.8% State: Block range on disk(s) Transition: Probability of changing block ranges Stats: rd/wr, rnd/seq, block size, inter-arrival time (Reference: S.Sankar et al. (IISWC 2009)) 7

Hierarchical Model One or Multiple Levels Hierarchical representation User defined level of granularity 8

Comparison with Previous Tools IOMeter: most well-known open-source I/O workload generator DiskSpd: workload generator maintained by the windows server perf team Δ of Features IOMeter DiskSpd Inter-Arrival Times (static or distribution) Intensity Knob Spatial Locality Temporal Locality Granular Detail of I/O Pattern Individual File Accesses* * more in defragmentation use case 9

Implementation (1/3): Inter-arrival times Inter-arrival times Outstanding I/Os!! Inter-arrival times: Property of the workload Outstanding I/Os: Property of system queues Scaling inter-arrival times of independent requests => more intense workload Previous work relies on outstanding I/Os DiskSpd: Time distributions (fixed, normal, exponential, Poisson, Gamma) Each transition has a thread weight, i.e., a proportion of accesses corresponding to that transition Thread weights are maintained both over short time intervals and across the workload s run 10

Implementation (2/3): Understanding Hierarchy Levels++ -> Information++ -> Model Complexity++ Propose hierarchical rather than flat model: Choose optimal number of states per level (minimize inter-state transition probabilities) Choose optimal number of levels for each app (< 2% change in IOPS) Spatial locality within states rather than across states Difference in performance between flat and hierarchical model is less than 5% Reduce model complexity by 99% in transition count 11

Implementation 3/3: Intensity Knob Scale inter-arrival times to emulate more intensive workloads Evaluation of faster storage systems, e.g., SSD-based systems Assumptions: Most requests in DC apps come from different users (independent I/Os), so scaling inter-arrival times is the expected behavior in the faster system The application is not retuned for the faster system (spatial locality, I/O features remain constant) may require reconsideration 12

Methodology 1. Production DC Traces to Storage I/O Models I. Collect traces from production servers of a real DC deployment II. III. ETW : Event Tracing for Windows I. Block offset, Block size, Type of I/O II. III. File name, Number of thread Generate the storage workload model with one or multiple levels (XML format) 2. Storage I/O Models to Synthetic Storage Workloads I. Give the state diagram model as an input to DiskSpd to generate the synthetic I/O load. II. Use synthetic workloads for performance, power, cost-optimization studies. 13

Experimental Infrastructure Workloads Original Traces: Messenger, Display Ads, User Content (Windows Live Storage) (SQL-based) Email, Search and Exchange (online services) D-Process (distributed computing) TPCC, TPCE (OLTP workloads) TPCH (DSS workload) Trace Collection and Validation Experiments: Server Provisioned for SQL-based applications: 8 cores, 2.26GHz Total storage: 2.3TB HDD SSD Caching and IOMeter vs. DiskSpd Comparison: Server with SSD caches: 12 cores, 2.27GHz Total storage: 3.1TB HDD + 4x8GB SSD 14

Validation Compare statistics from original app to statistics from generated load Models developed using 24h traces and multiple levels Synthetic workloads ran on appropriate disk drives (log I/O to Log drive, SQL queries to H: drive) Metrics Original Workload Synthetic Workload Variation Rd:Wr Ratio 1.8:1 1.8:1 0% Random % 83.67% 82.51% -1.38% Block Size Distr. 8K(87%) 64K (7.4%) 8K (88%) 64K (7.8%) 0.33% Thread Weights T1(19%) T2(11.6%) T1(19%) T2(11.68%) 0%-0.05% Avg. Inter-arrival Time 4.63ms 4.78ms 3.1% Throughput (IOPS) 255.14 263.27 3.1% Mean Latency 8.09ms 8.48ms 4.8% Table: I/O Features Performance Metrics Comparison for Messenger 15

1 level 1 level 2 levels 3 levels 1 level 3 levels 2 levels 2 levels 2 levels 1 level IOPS :100 :100 :100 Validation Compare statistics from original app to statistics from generated load Models developed using 24h traces and multiple levels Synthetic workloads ran on appropriate disk drives (log I/O to Log drive, SQL queries to H: drive) 500 450 400 350 300 250 200 150 100 50 0 Original Trace Synthetic Trace Messenger Search Email User Content D-Process Display Ads Synthetic Workload TPCC TPCE TPCH Exchange Less than 5% difference in throughput 16

:100 :100 :100 IOPS Choosing the Optimal Number of Levels Optimal number of levels: First level after which less than 2% difference in IOPS. 700 1 level 2 levels 3 levels 4 levels 5 levels 600 500 400 300 200 100 0 Messenger Search Email User Content D-Process Display Ads TPCC TPCE TPCH Exchange Synthetic Workload 17

Throughput (IOPS) Validation Verify the accuracy in storage activity fluctuation 500 450 400 350 300 250 200 150 100 50 0 Original Workload Synthetic Workload Time Less than 5% difference in throughput in most intervals and on average 18

Decoupling Storage Activity from App Semantics Use the model to categorize and characterize storage activity per thread Filter I/O requests per thread and categorize based on: Functionality (Data/Log thread) Intensity (frequent/infrequent requests) Activity fluctuation (constant/high request rate fluctuation) Per Thread Characterization for Messenger Thread Type Functionality Intensity Fluctuation Weight Total Data + Log High High 1.00 Data #0 Data High High 0.42 Data #1 Data High Low 0.27 Data #2 Data Low High 0.13 Data #3 Data Low Low 0.18 Log #4 Log High Low 5E-3 Log #5 Log Low High 4E-4 19

Decoupling Storage Activity from App Semantics Reassemble the workload from the thread types: Data #0 Data #1 Data #2 Data #3 Log #4 Log #5 x 8 x 13 x 33 x 45 x 17 x 20 500 450 400 350 300 250 200 150 100 50 0 Recreate correct mix of threads (types + ratios) -> same storage activity as original application without requiring knowledge on application semantics Decouples storage studies from application semantics 20

Comparison with IOMeter 1/2 Comparison of performance metrics in identical simple tests (no spatial locality) Test Configuration IOMeter (IOPS) DiskSpd (IOPS) 4K Int. Time 10ms Rd Seq 97.99 101.33 16K Int. Time 1ms Rd Seq 949.34 933.69 64K Int. Time 10ms Wr Seq 96.59 95.41 64K Int. Time 10ms Rd Rnd 86.99 84.32 Less than 3.4% difference in throughput in all cases 21

Speedup Speedup Comparison with IOMeter 2/2 Comparison on spatial-locality sensitive tests No SSDs 1 SSD 2 SSDs 3 SSDs 4 SSDs - all 1.16 1.12 1.08 1.04 1 0.96 Messenger 1.15 1.1 1.05 1 0.95 User Content No SSD 1 SSD 2 SSDs 3 SSDs 4 SSDs - all 1.2 0.92 DiskSpd Tool IOMeter 0.9 DiskSpd Tool IOMeter No speedup with increasing number of SSDs (e.g., Messenger) Inconsistent speedup as SSD capacity increases (e.g., User Content) 22

Applicability Storage System Studies 1. SSD Caching Add up to 4x8GB SSD caches, run the synthetic workloads On average 31% speedup 2. Defragmentation Benefits Rearrange blocks on disk to improve sequential characteristics On average 24% speedup, 11% improved power consumption The modeling framework made these studies easy to evaluate without access to application code or full application deployment 23

Speedup SSD Caching Evaluate progressive SSD caching using the models Take advantage of spatial and temporal locality (frequently accessed blocks in SSDs) Significant benefits - Search: High I/O aggregation No benefits - Email: No I/O aggregation 2 1.8 Baseline - No SSDs 1 SSD 2 SSDs 3 SSDs 4 SSDs - all 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Messenger Search Email User Content D-Process Display Ads TPCC TPCE TPCH Exchange Synthetic Workload 24

Sequential I/OS (%) Defragmentation Disks favor Sequential accesses, BUT, in most applications: Random > 80% - Sequential < 20% Quantify the benefit of defragmentation using the models by rearranging blocks/files without actually performing defragmentation Evaluate different defragmentation policies (e.g., partial, dynamic) 60 50 Before Defragmentation After Defragmentation 40 30 20 10 0 Messenger Email Search User ContentD-ProcessDisplay Ads TPCC TPCE TPCH Exchange 25

Speedup Defragmentation 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 o Highest benefits: o o 0 Messenger Email Search User Content D-Process Display Ads TPCC TPCE TPCH Exchange Synthetic Workload TPCC/TPCE which benefit from accessing consecutive database entries D-Process and Email which have the highest Write/Read ratios 26

SSD Caching vs. Defragmentation SSD Caching Defragmentation 1.78 1.79 1.48 1.48 1.13 1.118 1.18 1.18 1.083 1.08 1.105 1.1 1.21 1.21 1.096 1.096 1.32 1.19 1.28 1.087 Messenger Email Search User Content D-Process Display Ads TPCC TPCE TPCH Exchange Most beneficial storage optimization depends on the application and system of interest 27

Conclusions Simplify the study of DC applications Modeling and Generation Framework: An accurate hierarchical statistical model that captures the fluctuation of I/O activity (including spatial + temporal locality) of real DC applications A tool that recreates I/O loads with high fidelity (I/O features, performance metrics) This infrastructure can be used to make accurate predictions for storage studies that would require access to real app code or full app deployment SSD caching Defragmentation Full application models + full system studies (future work) 28

Questions?? Thank you Contact: cdel@stanford.edu srsankar@microsoft.com

Defragmentation Disks favor Sequential accesses, BUT, in most applications: Random > 80% - Sequential < 20% Quantify the benefit of defragmentation using the models by rearranging blocks/files without actually performing defragmentation Evaluate different defragmentation policies (e.g., partial, dynamic) Workload Rd Wr Before Defrag After Defrag Random Seq Random Seq Messenger 62.8% 34.8% 83.67% 15.35% 63.17% 35.74% Email 52.8% 45.2% 84.45% 13.74% 61.64% 33.74% Search 49.8% 45.14% 87.71% 8.46% 70.87% 24.46% User Content 58.31% 39.39% 93.09% 5.48% 73.21% 24.99% D-Process 30.11% 68.76% 73.23% 26.77% 45.36% 54.41% Display Ads 96.45% 2.45% 93.50% 4.25% 78.50% 19.23% TPCC 68.8% 31.2% 97.2% 2.8% 71.1% 29.9% TPCE 91.3% 8.7% 91.9% 8.2% 77.7% 22.4% TPCH 96.7% 3.3% 65.5% 35.5% 52.8% 47.2% Exchange 32.0% 68.1% 83.2% 16.8% 68.1% 31.9% 30