The QLogic 82 Series is the Adapter of QLogic 1GbE Converged Network Adapter Outperforms Alternatives in Dell 12G Servers QLogic 82 Series Converged Network Adapter outperforms the alternative adapter in throughput per CPU utilization for both networking and converged traffic while using only a fraction of the host resources, leaving plenty of room for data center servers to efficiently scale applications and virtual servers. KEY FINDINGS The QLogic 82 Series 1Gb Converged Network Adapter offers greater efficiency in scaling (including maximum throughput and overall CPU efficiency) compared to alternative Converged Network Adapters. The QLogic adapter better supports the additional demands of a converged, virtualized data center. Our results demonstrate the following: Bandwidth: The QLogic adapter can achieve near line rate (38Gbps) of bidirectional throughput with low (a little more than 1 percent) CPU utilization. Networking : The QLogic adapter offers up to 49 percent higher networking throughput efficiency while requiring up to 31 percent less CPU resources compared to the alternative adapter. Teaming: In a link aggregation configuration, QLogic s adapter delivers up to 27 percent or 3.9Gbps higher throughput as workloads increase, as well as up to 49 percent higher throughput efficiency (Mbps/CPU utilization) compared to the alternative adapter. FCoE : For real-world applications (typically 4/8KB block sizes) QLogic s adapters outperform the alternative adapter by up to 53.9 percent. EXECUTIVE SUMMARY Leading the data center charge in I/O consolidation is Fibre Channel over Ethernet (FCoE). FCoE promises to unify data and storage traffic onto a single wire. As FCoE helps to enable data center consolidation, CPU efficiency related to I/O is emerging as a key factor in maximizing network consolidation. This is true for a couple of reasons: Consolidation creates denser server environments, which in turn drive higher throughput requirements for business-critical applications. By lowering CPU requirements to process I/O, virtualization ratios can be maximized to consolidate applications on fewer servers, thereby accomplishing low total cost of ownership (TCO). High throughput with low CPU utilization is the key to scaling within next-generation data centers. A Converged Network Adapter can easily accomplish this by offloading I/O processing for IP, iscsi, and FCoE from the host CPU. At the time this paper was first published, the QLogic 82 Series Converged Network Adapter was the only Converged Network Adapter to offload Fibre Channel over Ethernet (FCoE), iscsi, and IP (Ethernet) traffic concurrently. SN33987- Rev. C 2/17 1
The QLogic 82 Series is the Adapter of TEST CONFIGURATIONS AND PROCEDURES The testing discussed in this paper analyzes and compares both QLogic and competitive Converged Network Adapters in a representative selection across Ethernet and FCoE simulations of real-world workloads. Ethernet Configuration The industry s leading IxChariot test tool was used to simulate real-world applications and to evaluate each adapter s performance under realistic load conditions. IxChariot was used due to its ability to accurately assess the performance characteristics of an adapter running within a data center network. Chariot measurements were taken to determine the total throughput (Mbps) across increasing workloads (threads) and the percentage of CPU required. Throughput was then averaged across each thread count to estimate the CPU efficiency (Mbps/CPU). CPU efficiency provides an excellent estimate of the overhead imposed on the processor by running the significant networking traffic and handling network interrupts. During these tests, the workloads were increased by adding threads to reach a maximum of 2. As a result, these tests provide significant insight into each adapter s potential networking performance and remaining headroom to efficiently scale. Appendix A provides additional information regarding the test configuration (such as servers and driver versions). The testing demonstrated the QLogic adapter s performance advantages, which provide significantly greater CPU efficiency and an enhanced ability to scale in a virtualized environment compared to the alternative adapter. Ethernet Test Procedure 1. A QLogic 1Gb Ethernet dual-port PCIe 2. to Converged Network Adapter () was installed on the test server using the latest released driver. 2. IxChariot was set up with two End Point (EP) agents installed as remote agents, and an Ethernet Switch was installed between them. Each remote agent was set up to create and measure network traffic. 3. An IxChariot console was then attached to each EP to indicate what to transmit, when to transmit, and what data to collect. One End Point was designated as a client and the other as a server. Dual-port NIC Performance Testing each port on the server was configured and connected to two separate client servers. Teaming/failover Testing the dual-port QLogic adapter within the server was then configured for teaming with port as the active primary port and port1 as the secondary failover port. 4. The type of data transmission to model was selected. The type chosen was the default high performance throughput script. 5. The testing was run for standard MTU size with multiple threads. The NIC dual-port bi-directional scenario utilized both ports simultaneously to measure the total throughput capabilities and the CPU efficiency of a dual-port adapter. The failover scenario testing used bi-directional traffic across one active port while the second port was in active standby mode. 6. To simulate the failover, the cable was pulled from port on the server during data transmission while the test script was running. 7. These steps were repeated for the other Converged Network Adapter. FCoE Configuration The IOmeter I/O test tool was used to benchmark the QLogic 82 Series 1Gb Converged Network Adapter versus the alternative. The tool was chosen because of its ability to simulate real-world applications and to evaluate adapter and system performance under realistic storage workload conditions. The test accurately assesses the performance characteristics of an adapter running on a converged network. IOmeter measurements were taken for the total amount of transactions (IOPS), the total throughput (MBps) across real-world applications (block sizes of 4 and 8KB), and the percentage of CPU required to drive overall transactions. IOPS were then normalized by the CPU usage over the testing period to calculate the CPU efficiency (IOPS/CPU). A higher number indicates more efficient use of CPU resources for I/O. Appendix A provides additional information regarding the test configuration (such as servers and driver versions). Figure 1. Networking Test Configuration SN33987- Rev. C 2/17 2
The QLogic 82 Series is the Adapter of FCoE Test Procedure 1. A QLogic 1Gb Ethernet dual-port PCIe 2. to Converged Network Adapter () was installed on the test server using the latest released driver. 2. The adapter was then connected to a Cisco Nexus 52 switch and then to two RamSans configured with a total of eight LUNs. 3. The IOmeter was used to test for FCoE workload scalability. It was configured at a queue depth of 32 with a total of 16 workers. 4. To help simulate real-world environments, test data were run for standard storage application block sizes of 4 and 8 for sequential read (Seq RD), sequential write (Seq WR), and sequential read/write (Seq WR). 5. These steps were then repeated for the alternative Converged Network Adapter. The test results demonstrate the superiority of the QLogic offload engine to relieve CPUs of protocol processing tasks so that servers can remain highly scalable while providing high-performance data and storage applications the I/O bandwidth they require across virtualized networks. Ethernet Test Results The test results demonstrated that the QLogic 82 Series Adapters scaled more efficiently in an enterprise data center with real-world application workloads compared to the alternative Converged Network Adapters. For fully utilized NIC performance testing, dual-port bi-directional testing configuration ran traffic at full line rate across both adapter ports simultaneously. This was true for a teaming configuration simulating a high-availability environment with failover activated in a dual-port configuration with bi-directional traffic running at full line-rate speeds, and through increasing workloads. Maximum Bandwidth with Low CPU The QLogic 82 Series Converged Network Adapter is capable of delivering near line-rate performance from a dual-port, 1Gb adapter over a PCI Express Gen2 bus. This equates to more than 36Gbps (given a marginal loss for bus latencies and encoding overhead). When compared to alternatives, QLogic s dual-port 82 Series Adapter delivers more than 36Gbps of aggregate bidirectional throughput when sending 15B frames and uses approximately 14 percent of the host server s CPU (Figure 3). The total bandwidth and CPU utilization were captured for both adapters. The results show that QLogic has the advantage over the alternative; QLogic improves server performance by providing line-rate network bandwidth while limiting CPU overhead. 4 35 3 25 14% CPU 22% CPU Mbps 2 15 Figure 2. FCoE Test Configuration TEST RESULTS The test procedures were specifically designed to test data and storage I/O running over a converged 1Gb Ethernet network. In each test case, specific attention was given to the CPU utilization in relation to the I/O performance. These steps were taken to create an environment in which an enterprise-class Converged Network Adapter must perform; for example, transferring large amounts of data for an application with a high bandwidth demand while the CPU of the servers is being highly taxed by enterprise applications and virtual machines (VMs). 1 5 Figure 3. Dual-port Bi-directional Throughput and CPU SN33987- Rev. C 2/17 3
The QLogic 82 Series is the Adapter of NIC AND TEAMING PERFORMANCE NIC Dual-port Bi-directional Case First, the overall Ethernet performance of the adapters were evaluated. To do this, a real-world test scenario was created using both ports of a dualport adapter. Bi-directional data was sent through each simultaneously. The testing verified that the QLogic adapter again holds a significant advantage in CPU efficiency, averaging more than a 31 percent advantage over the alternative between 2 2 threads (Figure 4). CPU % 25. 2. 15. 1. 5.. 1 2 4 8 12 Figure 4. CPU Utilization Again, this becomes even more evident when looking at the throughput normalized by the percentage of the CPU that is required to drive the transactions. To calculate this number, the total throughput transmitted and received by both ports was divided by the average CPU percentage required to drive the I/O. The results show the QLogic adapter s advantage over the alternative adapter to deliver high throughput rates while minimizing CPU utilization (Figure 5). The QLogic adapter holds an average efficiency advantage of 49 percent over the alternative adapter. CPU effectiveness in regards to I/O will emerge as a key factor in maximizing data center efficiencies as well as consolidation efforts. This stands true for a couple of reasons. First, consolidation creates denser server environments, which in turn drive higher throughput requirements for business-critical applications. Second, by lowering CPU requirements to process I/O, virtualization ratios can be maximized to fully optimize servers. The QLogic 82 Series Converged Network Adapters provide the scalability advantages that will be required by future enterprise-class data centers. NIC Teaming Failover Case To test the ability of an adapter to meet the high availability and high throughput requirements of an enterprise data center, measurements were taken for total throughput of all data sent and received across an increasing workload. During this time, a failover state was replicated to demonstrate a high-availability environment. This data was then averaged across the CPU usage consumed over the testing period to determine the adapter s efficiency. Average throughput by overall CPU usage provides an excellent estimate of the overhead imposed by the adapter while running the benchmark and handling network interrupts. In addition, it serves to demonstrate each adapter s networking performance capabilities and potential to efficiently scale. During the testing, the QLogic 82 Series Adapter outperformed the alternative adapter by an average of up to 27 percent or 3.9Gbps over increasing workloads from 4 2 threads with bi-directional send and receive I/O (Figure 6). 18 16 14 12 3 25 MBps 1 8 6 Mbps/CPU % 2 15 1 4 2 4 8 12 16 2 5 1 2 4 8 12 Figure 6. Bi-directional I/O Throughput with Increasing s Figure 5. Dual-port Bi-directional CPU Efficiencies SN33987- Rev. C 2/17 4
The QLogic 82 Series is the Adapter of In the bi-directional NIC failover test, the QLogic adapter required substantially less CPU resources than the alternative adapter to drive more data. In fact, the QLogic adapter used on average a little more than five percent of the CPU 18 percent less than the amount the alternative adapter required (Figure 7). CPU % 25. 2. 15. 1. 5.. 4 8 12 16 2 Figure 7. CPU Usage in a Teamed I/O Environment Due to their low overhead, QLogic adapters leave the host server with plenty of CPU resources for the server to efficiently scale. This is imperative in next-generation data centers that run converged networks and/or VMs. This becomes even more evident when considering the throughput normalized by the percentage of the CPU that is required to drive the I/O (Figure 8). FCoE Test Results To evaluate FCoE storage I/O performance, QLogic created a real-world test scenario to verify an adapter s ability to deliver high throughput rates while minimizing CPU utilization. Using this standard testing philosophy, QLogic tested our 82 Series Converged Network Adapter against a leading alternative Converged Network Adapter within a Windows Server 28 environment, with a focus on comparing IOPS relative to CPU utilization. Measurements were done for throughput (MBps) across real-world block sizes of 4 and 8KB and a comparison was performed of the percentage of CPU utilization required to drive the I/O. Test results are reported as MBps divided by the percentage of CPU utilization to obtain the CPU efficiency. A higher number indicates more efficient use of CPU resources. As shown in Figure 9, a Converged Network Adapter from each company was used to obtain performance results for sequential read, sequential write, and sequential read/write I/O operations. Block sizes of 4 and 8KB, which are typical of Exchange and database workloads, were measured, and the results show that the QLogic 82 Series Adapter significantly outperforms the alternative adapter. The data confirms that with sequential 8KB read/write block sizes, which simulate application workloads for Oracle and Microsoft Exchange 27, the QLogic 82 Converged Network Adapter holds a 54 percent CPU efficiency advantage over the alternative CNA. 3 25 3 25 MBps/CPU % 2 15 1 5 IOPs/CPU 2 15 1 5 4 8 12 16 2 s Figure 8. Bi-directional I/O CPU Efficiency with Increasing s 4KB 8KB 4KB 8KB 8KB Seq RD Seq WR Seq RW Block Size Figure 9. Dual-port FCoE Throughput Divided by CPU Percentage 2. 18. 16. 14. CPU Percentage 12. 1. 8. 6. 4. 2.. 4KB 8KB 4KB 8KB Seq RD Seq RW Block Size Figure 1. FCoE CPU Utilization SN33987- Rev. C 2/17 5
The QLogic 82 Series is the Adapter of SUMMARY AND CONCLUSION Consolidation onto a single unified network Fibre Channel over Ethernet reduces physical complexity, lowers material costs, and simplifies operations. Ultimately, though, the most important benefit of a unified network is the gained efficiency in resource utilization. This is true as networks are consolidated onto a single protocol supporting all enterprise requirements, along with backward compatibility of each across existing hardware and management applications. However, if the wrong Converged Network Adapter is chosen to standardize on, efficiency for high throughput demands of applications, density of virtual servers, and overall consolidation efforts will fall short. As leading CIOs and data center managers strive to achieve maximum efficiency through resource utilization, QLogic delivers high-performance I/O solutions that minimize CPU resources, allowing efficient scaling of the entire data center. This is especially important for virtualized environments that support increasingly powerful applications as well as growing amounts of data. QLogic furnishes key infrastructure components with the greatest offload capabilities to provide high-performance data and storage applications the I/O bandwidth they require and to ensure that server resources are available when needed. Data center fabrics based on QLogic technology will increase enterprise IT efficiencies derived from the ability to expand the number of simultaneous applications or virtualized operating systems that can be run on a given platform. APPENDIX A Nehalem-based Server Operating System QLogic Converged Network Adapter -SR IxChariot Nehalem-based Server Operating System NIC Testing Configuration Server Configuration DL38 G6 Quad core dual 2.93Ghz 12GB RAM Windows Server 28 R2 Converged Network Adapters Driver: 4.2.15 Network Test Tool Firmware: 4.7.31 7. build 114 FCoE Testing Configuration Server Configuration Test Tool Default Chariot high-performance throughput script DL38 G6 Quad core dual 2.8Ghz 24GB RAM IOmeter 26.7.27 QLogic Converged Network Adapter -SR Windows Server 28 R2 Converged Network Adapters Driver: 9.1.9.15 Firmware: 4.7.31 Storage Configuration RamSan 325 Switch Firmware: 3.2.6-p6 Cisco Nexus 52 BIO version 1.2. Follow us: Share: Corporate Headquarters Cavium, Inc. 2315 N. First Street San Jose, CA 95131 48-943-71 International Offices UK Ireland Germany France India Japan China Hong Kong Singapore Taiwan Israel 217 QLogic Corporation. QLogic Corporation is a wholly owned subsidiary of Cavium, Inc. All rights reserved worldwide. QLogic and the QLogic logo are registered trademarks of QLogic Corporation. Cisco and Cisco Nexus are trademarks, registered trademarks, and/or service marks of Cisco Systems, Inc. IxChariot is a registered trademark of Ixia. PCIe and PCI Express are registered trademarks of PCI-SIG Corporation. RamSan is a registered trademark of Texas Memory Systems. Microsoft Exchange and Windows Server are registered trademarks of Microsoft Corporation. Oracle is a registered trademark of Oracle Corporation. All other brand and product names are trademarks or registered trademarks of their respective owners. This document is provided for informational purposes only and may contain errors. QLogic reserves the right, without notice, to make changes to this document or in product design or specifications. QLogic disclaims any warranty of any kind, expressed or implied, and does not guarantee that any results or performance described in the document will be achieved by you. All statements regarding QLogic s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only. SN33987- Rev. C 2/17 6