JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

Similar documents
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Java Without the Jitter

Understanding Application Hiccups

Diffusion TM 5.0 Performance Benchmarks

How NOT to Measure Latency

Finally! Real Java for low latency and low jitter

Enabling Java in Latency Sensitive Environments

Java Performance Tuning and Optimization Student Guide

How NOT to Measure Latency

Understanding Latency and Response Time Behavior

Understanding Java Garbage Collection

Azul Pauseless Garbage Collection

Understanding Java Garbage Collection

IBM Power Systems solution for SugarCRM

OS-caused Long JVM Pauses - Deep Dive and Solutions

Java Performance Tuning

Understanding Java Garbage Collection

Oracle Event Processing Extreme Performance on Sparc T5

JVM and application bottlenecks troubleshooting

High-Performance Lustre with Maximum Data Assurance

Page 2 of 6 SUT Model Form Factor CPU CPU Characteristics Number of Systems 1 Nodes Per System 1 Chips Per System 2 Hardware hw_1 Cores Per System 44

JVM Memory Model and GC

Scaling Up Performance Benchmarking

HP SAS benchmark performance tests

Understanding Garbage Collection

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

About Terracotta Ehcache. Version 10.1

Comparison of Storage Protocol Performance ESX Server 3.5

Optimising Multicore JVMs. Khaled Alnowaiser

How Not to Measure Latency

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Runtime Application Self-Protection (RASP) Performance Metrics

10/26/2017 Universal Java GC analysis tool - Java Garbage collection log analysis made easy

TECHNOLOGY WHITE PAPER. Azul Pauseless Garbage Collection. Providing continuous, pauseless operation for Java applications

Low Latency Java in the Real World

Zing Vision. Answering your toughest production Java performance questions

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

Oracle Database 12c: JMS Sharded Queues

WebSphere Application Server 6.1 Base Performance September WebSphere Application Server 6.1 Base Performance

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

New Java performance developments: compilation and garbage collection

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability

IBM Daeja ViewONE Virtual Performance and Scalability

Enabling Java-based VoIP backend platforms through JVM performance tuning

Introduction. Architecture Overview

IBM Rational ClearCase Remote Client Version 7.1 Performance Report (Windows)

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Real Time: Understanding the Trade-offs Between Determinism and Throughput

2011 Oracle Corporation and Affiliates. Do not re-distribute!

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

PRESERVE DATABASE PERFORMANCE WHEN RUNNING MIXED WORKLOADS

Contents Overview of the Gateway Performance and Sizing Guide... 5 Primavera Gateway System Architecture... 7 Performance Considerations...

Blade : A Datacenter Garbage Collector

Contents Overview of the Performance and Sizing Guide... 5 Architecture Overview... 7 Performance and Scalability Considerations...

SAP SD Benchmark with DB2 and Red Hat Enterprise Linux 5 on IBM System x3850 M2

Understanding Java Garbage Collection

SUMMARY OF RESULTS BENCHMARK PROFILE. server.

Virtual Switch Acceleration with OVS-TC

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Concurrent Garbage Collection

vsan 6.6 Performance Improvements First Published On: Last Updated On:

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

A Rational software Whitepaper 05/25/03. IBM Rational Rapid Developer Scalability and Performance Benchmark

Garbage Collection. Hwansoo Han

Oracle Business Activity Monitoring 12c Best Practices ORACLE WHITE PAPER DECEMBER 2015

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

How to keep capacity predictions on target and cut CPU usage by 5x

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Azul Disrupts the ROI Equation for High Performance Applications

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Extreme Storage Performance with exflash DIMM and AMPS

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

Understanding Java Garbage Collection

The Garbage-First Garbage Collector

Infor M3 on IBM POWER7+ and using Solid State Drives

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Technical Paper. Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array

Performance and Scalability with Griddable.io

Vertical Scaling of Oracle 10g Performance on Red Hat

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

IBM Cognos ReportNet and the Java Heap

Exploiting the Behavior of Generational Garbage Collector

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT

webmethods Task Engine 9.9 on Red Hat Operating System

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Red Hat JBoss Enterprise Application Platform 7.1

The Z Garbage Collector An Introduction

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Release Notes for Cisco Insight Reporter, v3.1

Hierarchical PLABs, CLABs, TLABs in Hotspot

Low latency & Mechanical Sympathy: Issues and solutions

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Transcription:

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul Systems, Zing and the Azul logo are trademarks or registered trademarks of Azul Systems, Inc. Linux is a registered trademark of Linus Torvalds. CentOS is the property of the CentOS project. Oracle, Java, and HotSpot are trademarks or registered trademarks of Oracle Corporation and/or its affiliates. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Solarflare is a trademark or registered trademark of Solarflare Communications, Inc. GitHub is a registered trademark of GitHub, Inc. Other marks are the property of their respective owners and are used here only for identification purposes 2 of 23

Table of Contents Legal Notices... 2 1 Executive Summary... 4 2 Background... 5 3 Testing Environment... 5 4 Testing Methodology... 7 5 Results... 9 6 Conclusion... 10 7 References... 11 8 Appendix... 12 3 of 23

1 Executive Summary While new In-memory Computing (IMC) techniques hold the promise of better scalability and performance, ensuring consistent, low latency performance isn t a guarantee for all Java applications. Careful attention to product choices, runtime components and deployment topologies are essential to maximizing the values of these new IMC solutions, including In-memory Data Grids (IMDGs). This benchmark study compares the response time performance of two different JVMs, namely Azul Zing and Java HotSpot, while running Red Hat JBoss Data Grid at different node sizes. All benchmark results were derived using the open source RadarGun framework and the jhiccup measurement tool. The testing methodology was designed to measure response time consistency and throughput of the data grid nodes at different JVM heap sizes. The data grid was configured for distributed mode with one data node and one backup node replicated across two physical servers (4 nodes total). The jhiccup measurement tool was used to capture and graph the response time distribution of the two JVMs during all benchmark runs. The results of the benchmark clearly show that response time distributions and runtime consistency vary widely between the two JVMs. For Java HotSpot, which employs a stop-the-world young generation collector, response time variance was more than 250x across individual runs (i.e. was not measurable using standard deviation). In contrast the Azul Zing runtime showed highly predictable response time consistency across the entire distribution. At 5 9s (i.e. 99.999%) Zing had a maximum response time of only 23 milliseconds when the grid was deployed using a 200 GB node. In addition, Java HotSpot, which reported response time outliers exceeding multiple seconds at 100 GB, was not able to complete the benchmark run at 200 GB because of system saturation due to consecutive garbage collections. The profound difference in response time profile of the two JVMs suggests that for Java deployments that require low latency SLAs, large individual instances (e.g. 20+ GB heaps), or higher sustained throughput, only the Zing JVM can keep up with object allocation rates without stalling the application. For data grid deployments, this could mean 1) lower node response times (<50 msec), 2) greater resistance to node failures due to heavy load and 3) larger node capacity with support for Java heaps up to 340 GB. This paper describes the testing environment, testing methodology and resulting response time profiles of two different JVMs while running JBoss Data Grid. Companies deploying or running Java applications including IMDGs can use this information to make more informed JVM choices to meet their specific business use case or they can leverage this benchmark to design their own testing scenarios. 4 of 23

2 Background Red Hat JBoss Data Grid, based on the Infinispan open source community project, is a distributed in-memory grid, supporting local, replicated, invalidation and distributed modes, While JBoss Data Grid can be configured in a number of different topologies, it is typically deployed as a data tier to provide faster read/write behavior and data replication while offloading the database or persistent data store. As an in-memory schema-less key/value store, JBoss Data Grid can provide a simple and flexible means to storing different objects without a fixed data model. Because the JBoss Data Grid is a Java application, it requires a Java Virtual Machine (JVM) for runtime deployment. Since not all JVMs are the same and employ different garbage collection algorithms (e.g. Concurrent Mark Sweep), the benchmark was configured to capture JVM response time variances for different Java heap sizes, live set sizes, objection allocation rates, mutation rates and other application-specific characteristics. For this performance study, the RadarGun benchmarking framework coupled with the jhiccup measurement tool was used to simulate load and measure response times of the JVMs. Originally designed to compare the performance of different data grid products, RadarGun provides a reproducible way to simulate a specific transaction load and measure the performance of the individual data grid nodes. To ensure accurate runtime measurements, even during long garbage collection (GC) pauses, jhiccup was added to the benchmark. Designed to measure only JVM responsiveness, jhiccup shows the best possible response time the application or data grid could have experienced during the benchmark run. jhiccup is not an end-toend performance measurement tool and does not capture the additional overhead of the application or data grid transaction logic. 3 Testing Environment The test environment consisted of two nearly identical Iron Systems servers; Node Servers A and B. Each server had 4 Intel Xeon processors with 512 GB of memory, running Red Hat Enterprise Linux 6 and JBoss Data Grid version 6.2. The two machines were directly interconnected using Solarflare Communications Solarstorm SFC9020 10GbE network cards. The exact configurations are listed below: Machine Configuration Node Server A Manufacturer Iron Systems Processors (x 4) Intel Xeon CPU E7-4820 @ 2.00GHz Memory (x 32) 16GB RDIMM, 1066MHz, Low Volt, Dual Rank Networking 1 x Solarflare Communications SFC9020 OS Red Hat Enterprise Linux Server release 6.2 5 of 23

Machine Configuration Node Server B Manufacturer Iron Systems Processors (x 4) Intel Xeon CPU E5-4620 0 @2.20GHz Memory (x 32) 16GB RDIMM, 1333MHz, Low Volt, Dual Rank Networking 1 x Solarflare Communications SFC9020 OS Red Hat Enterprise Linux Server release 6.2 The data grid was configured on the two servers in distributed mode with 1 backup or copy node replicated on the other system (i.e. 4 nodes in total). The RadarGun benchmark was configured with a master process on Node Server A which communicated directly with all 4 nodes via an in-process client which in turn issued requests to individual data grid nodes. jhiccup version 1.3.7 was used to launch the RadarGun benchmark, capturing JVM response times across all 4 grid nodes and then graphing the response time distribution up to the 99.999 th percentile. Both servers where configured to support both JVMs. The Java HotSpot JVM was configured to use the CMS collector with the following flags: -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxPermSize=512m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseLargePages The Azul Zing runtime did not employ any runtime flags and used the default C4 collector. For both Zing and Java HotSpot, the RHEL operating system was configured to use LargePages: -XX:+UseLargePages Java Configuration Azul Zing Java HotSpot JVM Version Azul Systems Zing java version "1.7.0-zing_5.8.0.0" Zing Runtime Environment for Java Applications (build 1.7.0-zing_5.8.0.0-b4) Zing 64-Bit Tiered VM (build 1.7.0-zing_5.8.0.0-b15-product-azlinuxM-X86_64, mixed mode) Java HotSpot java version "1.7.0_05" Java SE Runtime Environment (build 1.7.0_05-b06) Java HotSpot 64-Bit Server VM (build 23.1-b03, mixed mode) 6 of 23

4 Testing Methodology The benchmark was designed to measure JVM response times and throughput of JBoss Data Grid at multiple JVM heap sizes. Performance runs at 20, 40, 60, 100 and 200 GB heaps were repeated on both the Azul Zing and Java HotSpot JVMs. JBoss Data Grid was configured in distributed mode with just a single copy of the data and a backup replicated across two physical servers. In order to simulate real world conditions, the individual grid nodes and corresponding JVM memory heaps, were primed by the RadarGun benchmark for each test and before jhiccup measurements were taken. Initial data loads were approximately 40% of the maximum heap size for each run. In order to approximate typical Java HotSpot response times and maximum outliers during this short duration benchmark (i.e. 20-30 minutes), the default NewRatio=3 was used to avoid full garbage collection pauses during the measurement period 1. This ratio resulted in 25% of the heap allocated to the young generation and 75% dedicated to the old generation. In addition to priming the data nodes, the benchmark utilized multiple object sizes (i.e. 15% 1K, 35% 2K and 50% 4K) to better simulate real world scenarios. For the first 3 Java heap sizes (20, 40, and 60 GB), the transactional mix of the RadarGun benchmark was set to a ratio of 67% read, 32% write, 1% remove. However, in order to accelerate the priming of the data notes to 40% for the 100 and 200 GB runs, the transactional mix was set to 39% read, 60% write, and 1% remove. Because the RadarGun benchmark was primarily designed to measure the throughput of individual grid nodes for a given load, the open source jhiccup tool was added to capture and graph the responsiveness of the JVMs during the benchmark runs. The tool captured the JVM response times for all 4 nodes and then combined the distributions into a single graph or Hiccup Chart. The graphs shown below are the individual Hiccup Charts for the Java HotSpot test at 60 GB: 1 The test was configured to simulate a well-tuned Java HotSpot JVM, that is, to delay or prevent a full garbage collection (with resulting application pauses) as long as possible. This is how an application would normally be configured in production. Without this additional tuning, Java HotSpot maximum response times were likely to be orders of magnitude larger than they were in our test. 7 of 23

Figure 1 Individual Hiccup Chart for Java HotSpot using 60 GB heaps The benchmark ran the same transaction loads on each JVM at different Java heap sizes (20, 40, 60, 100 and 200 GB). Both JVM response times and RadarGun throughput metrics were captured for each run. Below is the RadarGun configuration for the 20 GB heap test. RadarGun Benchmark Configuration 20GB heap test Benchmark Duration 15 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 50,000 Transaction mix 67% read, 32% write, 1% remove Number of threads 30 8 of 23

5 Results By overlaying the different Hiccup Charts for all the Azul Zing and Java HotSpot runs, we can clearly see that response time consistency is vastly different for the two JVMs. While Java HotSpot performed well approximately 90% or the time (or 9 out of every 10 transactions), it starts to encounter response time delays at the 95 th percentile directly associated with its young generation CMS (Concurrent Mark Sweep) collector. Pauses for Java HotSpot at 20 GB approach 500 milliseconds (or ½ second) at the 99.9 th percentile. At 40 GB Java HotSpot approaches 1 second pauses, and at 60 GB, the JVM stalls approach 2 seconds. On even larger heaps Java HotSpot outliers start to dramatically increase (i.e. 5 seconds at 100 GB) and the JVM could not complete the benchmark at 200 GB because of severe thrashing. Figure 2 Aggregated Hiccup Chart Showing Pauses in Milliseconds From the graph in figure 2 Azul Zing shows highly consistent response time behavior across the entire benchmark spectrum. Whether at 20 GB or 200 GB, Zing response times were less than 30 milliseconds, with only a few outliers in the tails of the distributions as shown in Figure 3: 9 of 23

Figure 3 Hiccup Chart for Azul Zing using 200 GB heap When comparing the performance and consistency of the two JVMs, Azul Zing showed 22x better max outliers than Java HotSpot at 20 GB and more than a 250x advantage at 100 GB. While response time profiles of the two JVMs where very different, the throughput of the date grid as measured by RadarGun were much closer, with only a slight advantage to Zing (about 1 3% across the nodes) over the short duration of this benchmark (i.e. 20-30 minutes). 6 Conclusion This benchmark study demonstrates that application performance and runtime consistency can be greatly affected by the choice of JVM. Not all JVMs are the same and employ different GC algorithms that can impact application response time profiles. For applications that have explicit or even implicit SLAs, careful attention to application characteristics, such as heap size, live set size, objection allocation rates, and mutation rates are important factors when choosing a JVM that can meet deployment requirements. When metrics such as Sustained Throughput (i.e. a throughput value which never exceeds a specific response time SLA), and time-to-production are important, Azul Zing provides better runtime metrics with less JVM tuning and heap resizing in order to avoid potentially devastating pauses. For In-Memory Computing (IMC) solutions such as in-memory data grids (IMGDs), that benefit from larger Java heaps, Azul Zing provides a viable alternative with greater runtime consistency. When IMDGs are deployed with strict SLAs below 50-500 milliseconds, only Azul Zing will guarantee success. For IMDG configurations where node timeouts needs to be very low (e.g. under 50 milliseconds), only the Zing runtime can guarantee response times that will not cause nodes to go offline. 10 of

7 References Red Hat JBoss Data Grid http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Azul Systems C4 Continuously Concurrent Compacting Collector (ISMM paper) http://www.azulsystems.com/products/zing/c4-java-garbage-collector-wp Understanding Java Garbage Collection White Paper http://www.azulsystems.com/dev/resources/wp/understanding_java_gc RadarGun Framework Tutorial on Github https://github.com/radargun/radargun/wiki/five-minute-tutorial JHiccup open source performance measurement tool http://www.azulsystems.com/jhiccup Azul Inspector http://www.azulsystems.com/dev_resources/azul_inspector Azul Systems Zing FAQ http://www.azulsystems.com/products/zing/faq Contact Azul Systems Phone: +1.650.230.6500 Email: info@azulsystems.com www.azulsystems.com Red Hat Phone: +1.888.REDHAT1 WebForm: www.redhat.com/contact/sales.html www.redhat.com 11 of

8 Appendix 8.1 Test Durations and Settings RadarGun Benchmark Configuration 20GB heap test Benchmark Duration 15 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 50,000 Transaction mix 67% read, 32% write, 1% remove Number of threads 30 RadarGun Benchmark Configuration 40GB heap test Benchmark Duration 15 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 50,000 Transaction mix 67% read, 32% write, 1% remove Number of threads 60 RadarGun Benchmark Configuration 60GB heap test Benchmark Duration 35 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 50,000 Transaction mix 67% read, 32% write, 1% remove Number of threads 80 12 of

RadarGun Benchmark Configuration 100GB heap test Benchmark Duration 60 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 100,000 Transaction mix 39% read, 60% write, 1% remove Number of threads 100 RadarGun Benchmark Configuration 200GB heap test Benchmark Duration 120 minute runs Message Payload Sizes 15% 1024 bytes, 35% 2048 bytes, 50% 4096 bytes Mode of Data Grid Operation Distributed Heap Priming 35 45% Transactions 200,000 Transaction mix 39% read, 60% write, 1% remove Number of threads 100 13 of

8.2 Hiccup Charts for All Runs Note that the 4 graphs in each section represent 4 nodes, each with its own set of data. Zing 200 GB 14 of

Zing 100 GB 15 of

Zing 60 GB 16 of

Zing 40 GB 17 of

Zing 20 GB 18 of

Java HotSpot 100 GB 19 of

Java HotSpot 60 GB 20 of

Java HotSpot 40 GB 21 of

Java HotSpot 20 GB 22 of

23 of