Evaluation of the Chelsio T580-CR iscsi Offload adapter

Similar documents
Evaluation of Chelsio Terminator 6 (T6) Unified Wire Adapter iscsi Offload

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Demartek September Intel 10GbE Adapter Performance Evaluation for FCoE and iscsi. Introduction. Evaluation Environment. Evaluation Summary

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

How Flash-Based Storage Performs on Real Applications Session 102-C

Evaluation of Multi-protocol Storage Latency in a Multi-switch Environment

NVMe Direct. Next-Generation Offload Technology. White Paper

Use of the Internet SCSI (iscsi) protocol

The QLogic 8200 Series is the Adapter of Choice for Converged Data Centers

Storage Protocol Offload for Virtualized Environments Session 301-F

Gen 6 Fibre Channel Evaluation of Products from Emulex and Brocade

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Accelerate Applications Using EqualLogic Arrays with directcache

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

FCoE at 40Gbps with FC-BB-6

IBM and HP 6-Gbps SAS RAID Controller Performance

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Version 1.0 October Reduce CPU Utilization by 10GbE CNA with Hardware iscsi Offload

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Accelerating Workload Performance with Cisco 16Gb Fibre Channel Deployments

Performance Benefits of NVMe over Fibre Channel A New, Parallel, Efficient Protocol

10Gb iscsi Initiators

NVMe over Universal RDMA Fabrics

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

QuickSpecs. HP Z 10GbE Dual Port Module. Models

Low latency and high throughput storage access

Comparison of Storage Protocol Performance ESX Server 3.5

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

Accelerating storage performance in the PowerEdge FX2 converged architecture modular chassis

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Demartek Evaluation Accelerate Business Results with Seagate EXOS 15E900 and 10E2400 Hard Drives

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Demartek Evaluation Accelerated Business Results with Seagate Enterprise Performance HDDs

Seagate Enterprise SATA SSD with DuraWrite Technology Competitive Evaluation

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

Evaluation Report: Supporting Multiple Workloads with the Lenovo S3200 Storage Array

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

33% 148% 2. at 4 years. Silo d applications & data pockets. Slow Deployment of new services. Security exploits growing. Network bottlenecks

InfiniBand Networked Flash Storage

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Extremely Fast Distributed Storage for Cloud Service Providers

QLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Application Advantages of NVMe over Fabrics RDMA and Fibre Channel

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

ARISTA: Improving Application Performance While Reducing Complexity

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

WINDOWS SERVER 2012 ECHOSTREAMS FLACHESAN2

NVMe over Fabrics. High Performance SSDs networked over Ethernet. Rob Davis Vice President Storage Technology, Mellanox

The NE010 iwarp Adapter

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

The HBAs tested in this report are the Brocade 825 and the Emulex LightPulse LPe12002 and LPe12000.

Concurrent Support of NVMe over RDMA Fabrics and Established Networked Block and File Storage

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Use Cases for iscsi and FCoE: Where Each Makes Sense

RoCE vs. iwarp Competitive Analysis

Samsung PM1725a NVMe SSD

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Supermicro All-Flash NVMe Solution for Ceph Storage Cluster

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware

EXPERIENCES WITH NVME OVER FABRICS

Open storage architecture for private Oracle database clouds

Cloud Optimized Performance: I/O-Intensive Workloads Using Flash-Based Storage

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System

Dell EMC Microsoft Exchange 2016 Solution

Dell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives

RACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

Deep Learning Performance and Cost Evaluation

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

Solid State Storage is Everywhere Where Does it Work Best?

Evaluation Report: Dot Hill AssuredSAN 4824 in a Mixed Workload Virtualized Environment

Running Microsoft SQL Server 2012 on a Scale-Out File Server Cluster via SMB Direct Connection Solution Utilizing IBM System x Servers

All Roads Lead to Convergence

Universal RDMA: A Unique Approach for Heterogeneous Data-Center Acceleration

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

HP SAS benchmark performance tests

Deep Learning Performance and Cost Evaluation

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

HP Embeds Converged Network Adapters

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Managing Array of SSDs When the Storage Device is No Longer the Performance Bottleneck

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

What is QES 2.1? Agenda. Supported Model. Live demo

Transcription:

October 2016 Evaluation of the Chelsio T580-CR iscsi iscsi Offload makes a difference Executive Summary As application processing demands increase and the amount of data continues to grow, getting this data into and out of the processor as efficiently as possible is becoming increasingly important. For application servers using iscsi storage systems, deploying an iscsi offload adapter can reduce host CPU utilization while maintaining excellent performance. Chelsio commissioned Demartek to evaluate its T580-CR 40GbE iscsi with synthetic and realworld workloads, comparing the performance and host CPU utilization with and without the iscsi offload functions. For this project, we used synthetic and real-world workload tools. The synthetic workloads test specific combinations of I/O request parameters such as block size, queue depth, etc. The real-world workloads use the real applications that customers have running in their environments. Both types of workloads provide valid and valuable information about the performance of the system or device under test. Key Findings > For synthetic workloads, the Chelsio T580-CR iscsi generally provided higher throughput and lower host CPU utilization. > For our SQL Server 2016 OLTP workload, the Chelsio T580-CR iscsi offload adapter achieved 62% higher transaction rate than the native iscsi software initiator while consuming only 13% higher CPU utilization, on average. > For our SQL Server 2016 OLTP workload, the average minimum latency for the Chelsio T580- CR iscsi offload adapter was 40%-50% lower (better) than the native iscsi software initiator. > For our Exchange Server Jetstress 2013 workload, the Chelsio T580-CR iscsi offload adapter achieved 45% higher transactional IOPS than the native iscsi software initiator while consuming lower CPU utilization. demartek.com 2016 Demartek

Chelsio T580-CR iscsi The Chelsio T580-CR dual-port 40GbE adapter is a PCI Express 3.0 x8 adapter that offloads network and storage protocol functions onto the adapter, relieving the host CPU of many low-level protocol functions. The offloaded protocols include: > TCP/IP > iwarp RDMA > iscsi > FCoE > NVMe-oF By offloading these functions, the host CPU cycles that otherwise would have been used for Ethernet protocol processing can be applied to applications and getting more useful work done. The Chelsio T580-CR uses Chelsio s latest generation T5 chip and integrates a high-performance packet switch that allows traffic to be quickly switched between ports on the card as needed. Workload Overview iscsi Offload We wanted to compare the performance and host CPU utilization of various workloads with and without the iscsi offload functions enabled. For iscsi protocol, there is an initiator and a target. The initiator is generally an application host server running various applications. The target is the recipient of the iscsi storage protocol commands and contains the storage devices. Initiator For the host initiator, we installed one Chelsio T580-CR 40GbE adapter into a dual-processor server running Windows Server 2012 R2. We ran different workloads using the native Microsoft iscsi software initiator and repeated each workload with the Chelsio iscsi offload hardware initiator, and compared the results. Target For the target server, we installed one Chelsio T580-CR 40GbE adapter into a four-processor server running Red Hat Enterprise Linux (RHEL) 6.6 with LIO 1.0.0.4 drivers. We enabled the iscsi offload function on the target server for all tests. The target server was also configured with four Samsung SM-1715 NVMe SSDs acting as the target storage. One of the goals of this project was to ensure that the target server was not the bottleneck. To achieve this, we chose a target server that had more processing power than the initiator and used NVMe storage in the target, ensuring fast performance. demartek.com 2016 Demartek Page 2 of 11

Synthetic Workloads Iometer We ran Iometer with various combinations of blocksizes, numbers of workers and queue depths. The Chelsio hardware iscsi initiator generally provided higher throughput and lower host CPU utilization. demartek.com 2016 Demartek Page 3 of 11

demartek.com 2016 Demartek Page 4 of 11

SQL Server 2016 OLTP Performance On average, the Chelsio iscsi offload adapter achieved significantly higher application performance while consuming only a small amount of additional CPU. The memory utilization was nearly identical for both. demartek.com 2016 Demartek Page 5 of 11

SQL Server 2016 OLTP Latency The latencies were generally lower (better) for the Chelsio iscsi hardware initiator than for the native iscsi software initiator. The median of the Chelsio hardware initiator latencies was approximately 48% lower than the median latencies for the software initiator. demartek.com 2016 Demartek Page 6 of 11

SQL Server 2016 OLTP IOPS & Throughput The Chelsio iscsi offload adapter gave significantly higher performance in terms of IOPS and throughput than the native iscsi software initiator. demartek.com 2016 Demartek Page 7 of 11

Exchange Server Jetstress The Chelsio iscsi offload adapter performed better for Exchange Jetstress 2013. Selected output from the Microsoft Exchange Jetstress 2013 Performance Test Result Report is provided below. This data is not intended to be submitted for the Microsoft Exchange Solution Reviewed Program (ESRP). Only the pertinent data related the performance of the iscsi offload adapter is shown. Test parameters: Mailboxes 3000 IOPS/Mailbox 1.0 Mailbox size 1000 MB Microsoft Exchange Jetstress 2013 Performance Test Result Report iscsi Initiator Default Software Overall Test Result Pass Pass Chelsio T580-CR hardware offload Machine Name DMRTK-SRVR-E DMRTK-SRVR-E Jetstress Version 15.00.0775.000 15.00.0775.000 ESE Version 15.00.0516.026 15.00.0516.026 Achieved Transactional I/O per Second Target Transactional I/O per Second Initial Database Size (bytes) Final Database Size (bytes) 23761.995 34423.554 3.14841E+12 3.22038E+12 3000 3000 3.14655E+12 3.237E+12 Thread Count 32 32 I/O Database Reads Average Latency (msec) All Instances I/O Log Writes Average Latency (msec) All Instances % Processor time - Average % Processor time - Minimum % Processor time - Maximum 2.092 1.931 0.297 0.165 26.431 21.465 15.539 19.669 29.113 23.784 demartek.com 2016 Demartek Page 8 of 11

A Brief Commentary on Latency Before flash storage became commonplace in the datacenter, storage I/O latencies of 10 to 20 milliseconds were generally acceptable for many applications. Flash storage has been a game-changer in this area with sub-millisecond latency now the expectation for all-flash arrays. As with all technology advances, applications and user expectations have changed in response to this capability. The impact of higher latencies depends greatly on the workload. High bandwidth streaming or very sequential workloads might be more or less unaffected, especially where read-ahead buffering grabs more data than I/O requests actually demand. Data warehousing and video streaming are two examples of these types of workloads. However, if latencies become too high, even these jobs begin to suffer from noticeable lags. For optimal user experiences, lower latency is always better. Online transactional workloads can generate high numbers of IOPS and consume a respectable amount of bandwidth. Latency becomes especially important, particularly in very highly transactional workloads, when database requests are time sensitive and have a great deal of dependency on prior transaction results. Consider applications that perform real-time trend analysis and/or process vast amounts of data. Stock trading, weather forecasting, geological survey modelling, and biometric analysis are examples of workloads that can be extremely sensitive to latency. Latency in a storage network is introduced from several sources as shown below. Total latency includes the latency introduced by each component, and there can be more than one instance of some components. In an iscsi environment, the choice of using the native iscsi software initiator compared to a hardware offload initiator can make a significant difference in overall latency. When using the native iscsi software initiator, latency is increased in the host software stack (blue component in the diagram) because of the iscsi software protocol layer. In this case, the network interface card (NIC) functions as the HBA. When using an iscsi hardware offload adapter such as the Chelsio T580-CR, the latency within the host component is significantly reduced because the iscsi software stack is removed from the host, reducing the number of host CPU instructions required to perform iscsi I/O operations. Because the iscsi stack is running in the HBA hardware, it is accelerated. The following table provides some general ranges for latency. Factors contributing to latency within components can include speed of processors, generation of ASICs and other technology. Conservative latency ranges (microseconds) Host software stack Up to 1000 Adapter (10Gbps+) 1 50 Switch (10Gbps+) <1 50 Storage (all-flash) Up to 500 Cables (within a datacenter) <1 Contributors to Latency demartek.com 2016 Demartek Page 9 of 11

Test Environment Servers > Initiator Server: 2x Intel Xeon E5-2620 v3, 2.4 GHz, 12 total cores, 24 total threads, 64 GB RAM Windows Server 2012 R2 > Target Server: 4x Intel Xeon E5-4669 v4, 2.0 GHz, 88 total cores, 176 total threads, 1536 GB RAM 4x 1.6 TB Samsung SM-1715 NVMe SSD RHEL 6.6 with LIO iscsi Target Offload 1.0.0.4 Adapters > Chelsio T580-CR 40GbE Unified Wire adapter Ethernet Switch > HPE 5900 10GbE/40GbE (JG838A) with global pause enabled on all ports demartek.com 2016 Demartek Page 10 of 11

Summary and Conclusion For our synthetic and real-world application workload tests, the Chelsio T580-CR iscsi hardware offload adapter significantly outperformed the native iscsi software initiator in terms of IOPS, throughput and latency. Although not tested for this project, we would expect the Chelsio T580-CR hardware offload adapter to also perform well in iwarp RDMA and NVMe over Fabrics (NVMe-oF) environments. > For synthetic workloads, the Chelsio T580-CR iscsi generally provided higher throughput and lower host CPU utilization. > For our SQL Server 2016 OLTP workload, the Chelsio T580-CR iscsi offload adapter achieved 62% higher transaction rate than the native iscsi software initiator while consuming only 13% higher CPU utilization, on average. > For our SQL Server 2016 OLTP workload, the average minimum latency for the Chelsio T580- CR iscsi offload adapter was 40%-50% lower (better) than the native iscsi software initiator. > For our Exchange Server Jetstress 2013 workload, the Chelsio T580-CR iscsi offload adapter achieved 45% higher transactional IOPS than the native iscsi software initiator while consuming lower CPU utilization. The most current version of this report is available at www.demartek.com/demartek_chelsio_t580-cr_iscsi_offload_evaluation_2016-10.html on the Demartek website. Chelsio is a registered trademark of Chelsio Communications. Demartek is a registered trademark of Demartek, LLC. All other trademarks are the property of their respective owners. demartek.com 2016 Demartek Page 11 of 11