Performance Analysis in the Real World of Online Services

Similar documents
The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Transistors and Wires

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

HP SAS benchmark performance tests

Addressing the Stranded Power Problem in Datacenters using Storage Workload Characterization. January 30 th, 2010 Sriram Sankar and Kushagra Vaid

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

Lenovo Database Configuration

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

Quantifying Trends in Server Power Usage

Assessing and Comparing HP Parallel SCSI and HP Small Form Factor Enterprise Hard Disk Drives in Server Environments

IBM. IBM ^ pseries, IBM RS/6000 and IBM NUMA-Q Performance Report

IBM System p5 510 and 510Q Express Servers

SAS workload performance improvements with IBM XIV Storage System Gen3

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

SQL Server 2005 on a Dell Scalable Enterprise Foundation

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer

How Scalable is your SMB?

SSDs Driving Greater Efficiency in Data Centers

HP ProLiant delivers #1 overall TPC-C price/performance result with the ML350 G6

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

PRESENTATION TITLE GOES HERE

Response Time and Throughput

Infor M3 on IBM POWER7+ and using Solid State Drives

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

SanDisk Enterprise Storage Solutions

Infor Lawson on IBM i 7.1 and IBM POWER7+

Oracle Performance on M5000 with F20 Flash Cache. Benchmark Report September 2011

Benchmarking Enterprise SSDs

Ch. 7: Benchmarks and Performance Tests

Performance, Power, Die Yield. CS301 Prof Szajda

Erik Riedel Hewlett-Packard Labs

Storage Adapter Testing Report

Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950

2009. October. Semiconductor Business SAMSUNG Electronics

HP ProLiant DL380 Gen8 and HP PCle LE Workload Accelerator 28TB/45TB Data Warehouse Fast Track Reference Architecture

Multi-Core Microprocessor Chips: Motivation & Challenges

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Comparing the Performance and Power of Dell VRTX and HP c3000

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers

Benefits of Storage Capacity Optimization Methods (COMs) And. Performance Optimization Methods (POMs)

Samsung SSD PM863 and SM863 for Data Centers. Groundbreaking SSDs that raise the bar on satisfying big data demands

IBM System p5 550 and 550Q Express servers

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

VMmark V1.0.0 Results

LSI and HGST accelerate database applications with Enterprise RAID and Solid State Drives

Experience the GRID Today with Oracle9i RAC

Using EMC FAST with SAP on EMC Unified Storage

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Jason Waxman General Manager High Density Compute Division Data Center Group

Bill Nesheim Sun Microsystems, Inc. Bob Kasten Intel Corporation

Performance of computer systems

The Oracle Database Appliance I/O and Performance Architecture

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

IBM System Storage EXP3500 Express

Comparison of Storage Protocol Performance ESX Server 3.5

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Performance Report PRIMERGY RX100 S6

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

PowerEdge 3250 Features and Performance Report

Assessing performance in HP LeftHand SANs

IBM. New Tools. Innovative Technology Application Flexibility

Accelerate Applications Using EqualLogic Arrays with directcache

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis

Top 5 Reasons to Consider

Comparing Performance of Solid State Devices and Mechanical Disks

Dell PowerEdge R910 SQL OLTP Virtualization Study Measuring Performance and Power Improvements of New Intel Xeon E7 Processors and Low-Voltage Memory

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

COL862 Programming Assignment-1

Table of contents. OpenVMS scalability with Oracle Rdb. Scalability achieved through performance tuning.

Quantifying Load Imbalance on Virtualized Enterprise Servers

ECE 486/586. Computer Architecture. Lecture # 2

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Building blocks for high performance DWH Computing

Comparing Software versus Hardware RAID Performance

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

FAST FORWARD TO YOUR <NEXT> CREATION

Performance Modeling and Analysis of Flash based Storage Devices

Extremely Fast Distributed Storage for Cloud Service Providers

Energy Efficiency Metrics for Storage Herb Tanzer, Chuck Paridon Hewlett Packard Company

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

Best Practices for Setting BIOS Parameters for Performance

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Newest generation of HP ProLiant DL380 takes #1 position overall on Oracle E-Business Suite Small Model Benchmark

Manoj Agarwal Senior Development Lead BizTalk Server Product Team Microsoft India Development Center

Samsung s Green SSD (Solid State Drive) PM830. Boost data center performance while reducing power consumption. More speed. Less energy.

Accelerating OLTP performance with NVMe SSDs Veronica Lagrange Changho Choi Vijay Balakrishnan

CS3350B Computer Architecture CPU Performance and Profiling

Performance Baseline for Deploying Microsoft SQL Server 2012 OLTP Database Applications Using EqualLogic PS Series Hybrid Storage Arrays

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

Cisco UCS C-Series I/O Characterization

Charles Lefurgy IBM Research, Austin

Transcription:

Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software

My Background: Crossing the bridge from Architect to User! 1973: PhD in Electrical Engineering Carnegie Mellon University Thesis: Performance Evaluation of Multiprocessor Systems 4 years - Texas Instruments Research 17 years - Digital Equipment Corporation Processor Architecture and Performance 12 years - Intel Performance, Architecture, Strategic Planning 2 years - Microsoft Data Center Hardware Engineering

Global Foundation Services Delivering across the company, all over the world, around the clock Plus over 150 more sites and services

Performance per form ance (pər fôr məns) noun 1. the manner in which or the efficiency with which something reacts or fulfills its intended purpose. 2. operation or functioning, usually with regard to effectiveness, as of a machine 3. the ability to perform: efficiency b: the manner in which a mechanism performs <engine performance>

What Matters Throughput, Response Time Cost Effectiveness Performance / $ Performance / Watt Performance / $ / Watt Energy Consumption / Efficiency Reliability Maintainability

Common Performance Metrics SPEC CPU Benchmarks Highly tuned using compiler versions you may not have access to today Geometric Mean of several individual benchmarks Single and multistream Integer and Floating Point Transaction Processing Council TPC-C, TPC-E, TPC-H SPECpower Lots of lesser known benchmarks SPECweb, IOmeter, FSCT, WCAT etc

End User Issues Performance (speed) on old binaries My code compiled with my favorite compiler using my optimization level running on my operating system Industry standard benchmarks are a good first indicator, but not sufficient Configuration dependencies

Read the Fine Print! SPECpower Example: Impressive Result! 2 processors 8 GB memory 1 SSD! Source: www.spec.org

Source: www.tpc.org TPC-C Example Dual Processor Platform with price starting at $ 3,315 TPC-C Throughput: 631,766 tpmc Price/Performance: $1.08 USD per tpmc Total System Cost: $678,231 144 GB memory (18 x 8 GB) 26 10K RPM disk drives 1184 15K RPM disk drives Typical User Configuration: 16 GB memory, mid bin processor, ~8 disk drives for ~$10K

Balanced Performance Typical User Configuration Smaller memory generates more disk I/O Disk I/O limited Low Processor Utilization User Options Add more memory and drives higher cost Deploy just one processor Use lower frequency processor to save $ and power

Non-Linear Price Performance 800 700 600 500 400 300 200 100 0 Price - $ 1.86 GHz 2.0 GHz 2,13 GHz 2,26 GHz 2.4 GHz 2.53 GHz Source: www.intc.com

Power vs Performance Tradeoff 2.4 GHz vs 2.26 GHz Frequency ~5% frequency difference results in 2-3% performance difference on most workloads What about Power? 80 W vs 60W TDP 10-15 Watts savings dependent on workload What about Price? Same! Total Cost of Ownership lower for 60W CPU

Why Power is Important? Energy Consumption: US power rate 10.27 cents per Kilowatt hour) in 2008 according to DOE/eia (http://www.eia.doe.gov/cneaf/electricity/epm/table5_3.html) In a typical data center for every watt in server power there can be another 0.5 to 1 watt consumed for power distribution losses and cooling. PUE = Total Facility Power / IT Equipment Power Building load Demand from grid Power (Switch Gear, UPS, Battery backup, etc) IT load Demand from servers, storage, telco equipment, etc Cooling (Chillers, CRACs, etc)

But There is More! Data centers can cost between $10M and $20M per Megawatt Capital Depreciation Costs are Important Basic 1U Server - 3 year TCO Total equipment cost per server Power cost per server Data Center cost per server Data Center operating cost per server Your Mileage May Vary!

Power vs System Load Typical Web Applications are not CPU or disk intensive Platform imbalance can keep processor utilization low Idle power is typically over 50% of peak

Our Approach Invest in understanding your workloads Measure, Model, Validate, Predict, Measure Focus on Entire System Balance the Platform Focus on Total Cost of Ownership Acquisition Cost Energy Consumption Cost Data Center Capital Cost

2 Socket Catches up to 4 Socket Source: www.tpc.org, www.intc.com

Best Price/Performance Single Processor, 32 GB Memory, 102 Disk Drives Typical Configuration: 1 Quad Core Processor, 16 GB Memory, 8 Disk Drives (10K RPM, 146 GB), List Price: ~$7K Source: www.tpc.org

TPC-E Example Typical Configuration: 2 processors 32 GB Memory 24 x 146 GB 10K RPM Drives List Price: ~$15K

Testing with TPC-E Published results at www.tpc.org Use very small portion of available disk capacity Data on outer tracks of disks seek distance per disk: minimal for random access pattern spread out across numerous disks to get IOPS Not representative for real world usage Our Methodology Fill disk capacity for any server from 20-100% in increments of 20% (Simulate partial capacity utilization) Vary Active customer load from 20-100% (Simulate partial working set) Weighted Harmonic Mean to give a single tpse score for the server Representative of our usage scenario Different customers can utilize capacity to different extent Working sets are not usually the entire customer base

Typical System Performance 80 70 60 50 40 30 20 10 0 tpse CPU Util 8 Disk Drives 24 Disk Drives 57 Disk Drives Disk I/O Performance is a Dominant Factor

Web Capacity Analysis Tool (WCAT) Lightweight HTTP load generation tool Internal Microsoft tool created by Windows Server Performance Team Available for public download as IIS productivity tool (from www.iis.net ) Clear structured tool with good scalability, allows specific setups for different usage scenarios Basic scenarios we use in our web testing: Dynamic.ASPX content (CPU intensive) Static cold content (Disk intensive) Static hot cached content (network intensive) Windows Live mix content

Web Capacity Analysis Tool Test Environment Web Server under the test: contains workload content: ~ 2.5 GB, ~ 4 mln files (aspx, gif, html) WCAT Clients: provide actual load to the Web Server according to the test scenario WCAT Controller: configures client machines, runs test scenarios, creates log/report files with performance counters and WCAT runtime statistics Dual Machine Environment Web Server Database Server WCAT Controller WCAT Client Network Switch Public Network Multiple Machines Isolated Environment Network Switches WCAT Clients Public Network Database Server Web Server WCAT Clients WCAT Controller Network Router

WCAT Performance Faster CPU Increases Throughput

CPU vs Storage Performance Huge increase in CPU Performance Storage Performance not increasing proportionately Need to understand storage requirements in enterprise datacenters Source: Intel

Benchmark Development Most benchmarks are developed by system or component vendors Goal is to showcase their products in best light May not relate well to real world applications Common framework for comparison End Users need to be more active Internal Benchmarks are most appropriate Your Mileage Will Vary!

Storage Workload Characterization How do we do it? Event Tracing for Windows (ETW) Collect disk event traces through Windows Instrumentation Production traces taken for particular time periods to observe workload behavior on storage Analysis Summary characteristics Block sizes Queue depth Randomness of workload Read/Write patterns IOPS, MBPS Temporal analysis Outstanding I/Os Interarrival Time Latency

Throughput (IOPS) Storage Workload Characterization Why do we do it? Understand workload profiles in production Mostly random, with high read:write ratios Design next generation servers Server rightsizing Balancing CPU and Storage performance needs Explore new technologies in the light of Microsoft workloads SSD for storage evaluate performance-power-cost advantage for enterprise profiles Work with OEM partners to optimize server components for our profiles 3000 2500 512MB vs 256MB cache for FILE workload (8K random R/W) 512MB 256MB 2000 1500 1000 500 0 Typical File Server Operating Range 1 2 4 8 16 32 64 128 256 512 Outstanding I/Os (queue depth)

1 1 1 1 1 1.17 1.14 0.98 0.625 1.33 1.28 0.96 1 0.89 1.53 1.58 1.41 1.5 2.18 5.6 Search Performance Scaling Frequency Performance Perf/GHz CPU List Price CPU Power 2.0 GHz (80W) 2.33 GHz (50W) 2.67 GHz (80W) 3.167 GHz (120W) Sweet Spot is often at mid bin frequency, especially when price and power are considered

Business As Usual Microprocessor Pricing History Increased Performance at Constant Price Multicore drove big increase going from single to dual to quad core Easy to ride the technology curve Single -> Dual -> Quad Issue: Performance Scaling with Threads Result: Decreasing CPU Utilization Opportunity to Right-Size Quad -> Dual -> Single

Right-Sizing Example Dual Socket Platform Max CPU utilizations for File Servers 40% 35% 35% 30% 25% 20% 15% 10% 20% 20% 11% 26% 15% 5% 0% CPU utilization Removing the second processor would save power and cost!

Conclusions There is more to performance than speed. Processor performance has outpaced our ability to consume it in many cases. Difficult to exploit CPU performance increase across the board Platform imbalance is an opportunity to right-size to save power and cost. Power is an important performance metric. Industry Standard benchmarks may not reflect your environment. Do your own workload characterization.

2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. Microsoft is a trademark of the Microsoft companies. The Microsoft Financing marks are used by CIT Financial Ltd. under license. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.