White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits of using the Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 for a high-performance transactional workload on Oracle database12c. Swingbench was used to populate the database and generate the I/O workload for this solution. The solution performed nearly 1 million transactions per minute, resulting in 100,000 I/O operations per second (IOPS) with average latency of just 0.4 millisecond. Solution Benefits Optimize your return on investment (ROI). Easily deploy and scale the solution. Get faster responses at a lower cost. Achieve a high number of I/O operations per second (IOPS) and low latency. Highlights Industry-Leading Performance and User Scalability The Cisco UCS B420 M4 Blade Server platform enables customers to improve performance of all critical applications, reduce IT costs through consolidation, manage more data on less hardware, and make better business decisions in real time. Linear scalability in database performance was achieved as the number of users increased from 50 to 400 concurrent users. Improved Database Productivity with Less Maintenance and Tuning Overall productivity is increased due to the elimination of inherent issues that arise with spinning-disk solutions. The solution decouples database and storage administration from database storage troubleshooting to optimize Oracle applications, leading to an improved customer experience. Significant Reduction in Costs Reduced storage infrastructure requirements result in a significant cost reduction. Overview Designed for demanding virtualization and database workloads, the Cisco UCS B420 M4 Blade Server (Figure 1) combines a large memory footprint with 4-socket scalability using the Intel Xeon processor E5-4600 v3 product family. The blade server supports 2133 MHz of DDR4 memory and uses Cisco UCS virtual interface card (VIC) technology to achieve up to 160 Gbps of aggregate I/O bandwidth, all in a dense, full-width blade form factor. The Cisco UCS B420 M4 maintains memory performance even as capacity grows, and the large power envelope of the Cisco UCS 5108 Blade Server Chassis means the Cisco UCS B420 M4 can handle up to 3 terabytes (TB) of memory without compromising CPU speed or core count. Up to four Cisco UCS B420 M4 servers can be installed in the Cisco UCS 5108 Blade Server Chassis. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 11
Figure 1. Cisco UCS B420 M4 Blade Server Front View Figure 2. Cisco UCS B420 M4 Blade Server Inside view The Cisco UCS B420 M4 provides: Four Intel Xeon processor E5-4600 v3 series CPUs for up to 72 cores per server 48 DIMM slots providing 3 TB of 2133-MHz DDR4 memory Three mezzanine connectors enabling bandwidth of 160 Gbps Four SAS, SATA, or SSD hot-pluggable drive bays RAID 0, 1, 5, and 10, with optional 2-GB flash memory-backed write cache Up to four Cisco UCS B420 M4 Blade Servers per Cisco UCS 5108 Blade Server Chassis The Cisco UCS B420 M4 server is well suited for demanding IT workloads, including: Large virtual server and virtual desktop workloads Memory-intensive database installations Cloud infrastructure 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 11
Enterprise resource planning (ERP) and customer relationship management (CRM) applications Development and in-house applications Fusion iomemory PX600 Fusion-io (a SanDisk company) builds server-resident PCI Express (PCIe) flash storage for applications requiring a high IOPS rate with low latency. For enterprise applications such as Oracle, Fusion iomemory PX600 delivers a significant improvement in performance by reducing latency with a persistent, reliable, high-performance and highcapacity storage tier. By significantly improving performance, iomemory helps enable customers to reduce infrastructure and power and cooling costs compared to a traditional hard disk storage system, for lower total cost of ownership (TCO). The iomemory solution offers a persistent storage option, enabling the card in a server to load an entire database of less than 1300 GB into the card s flash memory, or just the performance-demanding structures of a larger database. Offloading storage requirements to the card from the SAN, closer to the server, can significantly increase performance. The iomemory PX600 provides: 1300 GB of multilevel cell (MLC) flash capacity 2.7 GBps of bandwidth (1-MB read operation) 1.7 GBps of bandwidth (1-MB write operations) 235,000 IOPS (512-byte random read operations) 370,000 IOPS (512-byte random write operations) 15 microseconds of write latency, and 92 microseconds of read latency Hardware supported: All Cisco UCS B-Series M4 blade servers Software supported: Cisco UCS Manager 2.1 Oracle 12c Solution Configuration Installation and configuration details for the solution are beyond the scope of this document. Here are the high-level steps for the solution: 1. Install Oracle Linux 6.5. 2. Download matching iomemory firmware and driver RPM packages from Cisco.com. Download additional utilities and RPM packages from the SanDisk support website (fusion-o support site). Here is the list of installed packages: [root@oracle1 ~]# rpm -qa grep fio fio-util-4.1.1.297-1.0.el6.x86_64 fio-common-4.1.1.297-1.0.el6.x86_64 fio-sysvinit-4.1.1.297-1.0.el6.x86_64 fio-2.1.10-1.el6.rf.x86_64 fio-preinstall-4.1.1.297-1.0.el6.x86_64 [root@oracle1 ~]# rpm -qa iomemory* iomemory-vsl4-3.8.13-16.2.1.el6uek.x86_64-4.1.1.297-1.0.el6.x86_64 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 11
3. After the packages are installed, verify that iomemory cards are detected in /dev (for example, /dev/fct0). Use the fio-format utility to format and attach iomemory cards. 4. Create OS partitions on both iomemory cards. The database solution tested here used four data partitions (270 GB) and two log partitions (30 GB) on each card. 5. Configure Oracleasm RPM and packages. Use asmtool to create and configure asm volumes. Here is a sample command: /usr/sbin/asmtool -C -l /dev/oracleasm -n data1 -s /dev/fioa1 -a force=yes Please refer to this document for additional configuration details: http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/unified-computing/whitepaper_c11-732623.html Fusion iomemory PX600 Tests: Speeds and Feeds System Tests It s a common practice to evaluate the system performance before deploying any database application. System baseline performance testing is performed using common I/O calibration tools such Oracle Orion and Linux FIO. These tools can generate I/O patterns that mimic the type of I/O operations performed by Oracle databases. The testing here used both Orion and Linux FIO to measure I/O performance before actually installing Oracle. Figure 3 shows I/O tests at different read and write percentages and the corresponding IOPS. Figure 4 shows the random-read IOPS and throughput tests for various block sizes exercised using the iomemory PX600. Figure 3. Random IOPS for Various Read-Write Ratios and Latency Tests 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 11
Figure 4. IOPS and Bandwidth Tests These were the main performance results: 300,000 IOPS pure random read operations (4-KB block size) 275,000 IOPS pure random read operations (8-KB block size) 213,000 IOPS for mixed read and write operations (8-KB block size) 174,000 IOPS for pure random write operations (8-KB block size) 5.1 GBps throughput (1-MB read operations) 0.5 ms average latency (mixed read and write operations) User Scalability Tests For the database user scalability tests, an Oracle 12c single-instance database is loaded with 1.7 TB of order-entry transactions using the Swingbench testing tool. The database is configured with 96 GB of Oracle System Global Area (SGA; 5 percent of the data set in the memory), and the Swingbench transactional load test was run with users scaling from 50 to 400. Each database test run consists of capturing and analyzing test reports from the Swingbench testing tool, Oracle Automatic Workload Repository (AWR) reports, and Oracle Enterprise Manager reports, plus system performance metrics from the server and OS perspective. As the database scaled from 50 to 400 concurrent users, the testing showed nearly linear scalability in transactions per minute and IO throughput with little or no change in latency (Figure 5). As the number of users scaled beyond 400, the testing revealed typical data concurrency related events, which slightly reduced throughput per minute (TPM) and nominally increased latency. These changes were attributed to the greater number of users working on a relatively smaller data set at a very high speed. From captured statistics, the testing also verified that neither hardware saturation nor bottlenecks occurred. For small and medium-size businesses, departmental databases, and quality assurance groups running either smaller databases or medium-size databases (less than 1700 GB), an entire Oracle database is shown to perform excellently on the flash memory in a iomemory PX600 in a Cisco UCS B420 M4 (Figure 5). 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 5 of 11
Figure 5. Database User Scalability Test Results Oracle AWR and Enterprise Manager Reports Database performance was assessed using the Top-10 events AWR report (Figure 6) and the Enterprise Manager performance report (Figure 7). Note these important results in the performance reports: Figure 6. Top-10 Wait Events from Oracle AWR Report with 400 Users 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 6 of 11
Figure 7. Oracle Enterprise Manager IOPS and Latency Charts Database CPU utilization was 61.8 percent: effective utilization of CPU cycles for performing database transactions. 234 million single-block database file sequential read requests (8-KB random read operations) are serviced with less than 0.4-millisecond latency. These operations are mostly transaction-related index fetches, showing the random nature of the workload I/O profile. 58 million log file sync events (sequential write operations) are managed with 0.36-millisecond latency. Nearly 100,000 consistent and sustained IOPS occurred in the entire test duration. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 7 of 11
Oracle Work Versus the Wait Distribution Cycle An important characteristic of an optimized solution is delivery of high-performance results with effective utilization of system resources (CPU cycles) and elimination of inefficiencies in the form of wait events, latency, etc. Figure 8 shows the distribution of database time during database testing. Figure 8. Distribution of Database Time Spent in Various Activities The figure shows that most database cycles were spent in database CPU activities (61.8 percent). This amount is the actual work that the database performed in processing transactions. The next set of events that consumed database cycles consists of activities that need to be serviced quickly to help ensure increased database transaction throughput. User I/O consumed 34.6 percent of the total data requests that needed to be fetched from storage to the Oracle engine to process transactions. Commit operations consumed 7.6 percent of the database cycles. This activity is triggered when a user session commits a transaction and the contents of the log buffer have to be written to the redo log file to confirm to the user that the transaction is committed and is fully secured. The rest of the activities system I/O, configuration, and other processes contribute less than 1 percent of the total activity. In a traditional SAN environment, a significant chunk of database time is spent in I/O processing waiting for data from the storage system. This wait time results in overall reduced database CPU utilization, and hence a lower number of transactions. The use of iomemory in this configuration testing helps ensure increased database CPU utilization, with significant reduction in time needed to service data requests, resulting in a highly optimized database solution. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 8 of 11
System Performance Metrics Computing, storage, memory, and network performance metrics constitute the system performance metrics. Figure 9 shows the storage IOPS compared to the latency for various user scalability tests. The chart shows that as database test users increased from 50 to 400, IOPS scaled linearly while still delivering very low latency. This performance behavior is critical to delivering a high-performance database solution, to help ensure quick response time even with database user spikes up to designed maximum. To maintain quick response time, storage systems should deliver the data requests in the form IOPS by maintaining consistently low latency. These two performance features define high-performance storage systems. The figure shows that the peak of 400 users delivering 68,000 read IOPS and 30,000 write IOPS yields close to 100,000 IOPS at 0.4-millisecond latency. Figure 9. IOPS Versus Latency for User Scalability Tests To deliver high performance for any enterprise workloads, CPU cycles must be efficiently utilized in the user space by reducing system-level overhead and I/O wait cycles. Figure 10 shows the CPU performance graph for the peak of 400 users. Note the following points based on the CPU performance graph: Even with the peak of 400 users, less than 10 percent of the CPU cycle time was spent on I/O wait and system space cycles. The user CPU utilization is about 31 percent. The rest of the time 55 percent is idle, leaving CPU cycles available for any additional (second database) workload. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 9 of 11
Figure 10. CPU Utilization Conclusion The solution stack must be designed with balanced server, storage, and networking subsystems to deliver reliable, scalable high performance for enterprise database applications. The test results and performance metrics illustrated in this document show that a solution stack using Cisco UCS B420 M4 with Fusion iomemory PX600 achieves these performance and scalability benefits. The performance metrics also show that the solution leaves a significant amount of CPU cycles available to accommodate additional databases, signifying a high-performance consolidation solution for small to medium-size databases. This high-performance, consolidation solution is simple and easy to deploy at a lower cost than other popular SAN storage systems. These features also help reduce maintenance (frequent mechanical disk failures), power, cooling, and rack space costs significantly. You can also augment existing investments in traditional SANs by offloading specific hot database objects with high response time requirements to the flash memory in Fusion iomemory for performance improvements, leaving the remaining cold data to be served from traditional SAN storage. 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 10 of 11
Printed in USA C22-734918-00 06/15 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 11 of 11