How to keep capacity predictions on target and cut CPU usage by 5x

Similar documents
Typical Issues with Middleware

Is your IT network like Boston traffic? Unclog it with Network Capacity Planning

Java Without the Jitter

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

OS-caused Long JVM Pauses - Deep Dive and Solutions

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

Java performance - not so scary after all

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

10/26/2017 Universal Java GC analysis tool - Java Garbage collection log analysis made easy

Java Performance Tuning and Optimization Student Guide

Attila Szegedi, Software

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

JVM and application bottlenecks troubleshooting

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

Cloud Monitoring as a Service. Built On Machine Learning

JVM Memory Model and GC

A JVM Does What? Eva Andreasson Product Manager, Azul Systems

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

New Java performance developments: compilation and garbage collection

The Garbage-First Garbage Collector

Future of JRockit & Tools

Configuring the Heap and Garbage Collector for Real- Time Programming.

HBase Practice At Xiaomi.

Scaling Up Performance Benchmarking

A Side-channel Attack on HotSpot Heap Management. Xiaofeng Wu, Kun Suo, Yong Zhao, Jia Rao The University of Texas at Arlington

Optimising Multicore JVMs. Khaled Alnowaiser

Runtime Application Self-Protection (RASP) Performance Metrics

Lesson 2 Dissecting Memory Problems

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

Diagnostics in Testing and Performance Engineering

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Practical Lessons in Memory Analysis

Java Performance Tuning From A Garbage Collection Perspective. Nagendra Nagarajayya MDE

Virtualizing JBoss Enterprise Middleware with Azul

Garbage Collection. Steven R. Bagley

XTP, Scalability and Data Grids An Introduction to Coherence

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

G1 Garbage Collector Details and Tuning. Simone Bordet

Zing Vision. Answering your toughest production Java performance questions

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Automatic Memory Management

Exploiting the Behavior of Generational Garbage Collector

White Paper. Major Performance Tuning Considerations for Weblogic Server

KEMP 360 Vision. KEMP 360 Vision. Product Overview

TUTORIAL: WHITE PAPER. VERITAS Indepth for the J2EE Platform PERFORMANCE MANAGEMENT FOR J2EE APPLICATIONS

webmethods Task Engine 9.9 on Red Hat Operating System

Do Your GC Logs Speak To You

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

Java Performance Tuning

Real Time: Understanding the Trade-offs Between Determinism and Throughput

Efficient data access techniques for large structured data files

Java Performance: The Definitive Guide

Garbage Collection Algorithms. Ganesh Bikshandi

Lies, Damn Lies and Performance Metrics. PRESENTATION TITLE GOES HERE Barry Cooks Virtual Instruments

@joerg_schad Nightmares of a Container Orchestration System

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

2. PICTURE: Cut and paste from paper

Habanero Extreme Scale Software Research Project

Understanding Application Hiccups

Finally! Real Java for low latency and low jitter

Azure database performance Azure performance measurements February 2017

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Regain control thanks to Prometheus. Guillaume Lefevre, DevOps Engineer, OCTO Technology Etienne Coutaud, DevOps Engineer, OCTO Technology

Dynamic Selection of Application-Specific Garbage Collectors

ArcGIS Enterprise: Performance and Scalability Best Practices. Darren Baird, PE, Esri

It s Good to Have (JVM) Options

A Study Paper on Performance Degradation due to Excessive Garbage Collection in Java Based Applications using Profiler

Lecture 13: Garbage Collection

ProdDiagNode - Version: 1. Production Diagnostics for Node Applications

2011 Oracle Corporation and Affiliates. Do not re-distribute!

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Rapid Bottleneck Identification A Better Way to do Load Testing. An Oracle White Paper June 2008

Understanding Latency and Response Time Behavior

JDK 9/10/11 and Garbage Collection

Oracle Database 12c: JMS Sharded Queues

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

March 10 11, 2015 San Jose

An Oracle White Paper February Comprehensive Testing for Siebel With Oracle Application Testing Suite

Garbage Collection. Hwansoo Han

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

JVM Performance Tuning with respect to Garbage Collection(GC) policies for WebSphere Application Server V6.1 - Part 1

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

Batch Jobs Performance Testing

Fiji VM Safety Critical Java

Consolidating Enterprise Performance Analytics

Using Automated Network Management at Fiserv. June 2012

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

ArcGIS Enterprise Performance and Scalability Best Practices. Andrew Sakowicz

IBM Security QRadar Deployment Intelligence app IBM

Diplomado Certificación

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

B. Pack -domain=c:\oracle\user_projects\domains\mydomain.jar -template=c:\oracle\userj:emplates\mydomain -template_name=nmy WebLogic Domain"

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

The Fundamentals of JVM Tuning

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

SAP ENTERPRISE PORTAL. Scalability Study - Windows

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Determining the Number of CPUs for Query Processing

Transcription:

How to keep capacity predictions on target and cut CPU usage by 5x Lessons from capacity planning a Java enterprise application Kansas City, Sep 27 2016 Stefano Doni stefano.doni@moviri.com @stef3a linkedin.com/in/stefanodoni

A Business-centric Capacity Modelling framework IT Saturation Threshold IT Resource Utilization (e.g. CPU Utilization%) Current Working Area Residual Capacity Available In its current configuration, this system can manage up to 14k users before reaching saturation Maximum Business Capacity Business Volume (e.g. #users) 2

What s the Problem with Java Applications? Application CRASH! HW resources were healthy So where is the bottleneck? CPU Utilization % The Bottleneck Was Java Heap Memory! 3

Java Memory Bottlenecks: a devastating impact IT Resource Utilization (e.g. CPU Utilization%) Actual Business Capacity Java Memory Bottleneck Estimated Business Capacity Capacity is hugely overestimated! Java bottlenecks must be considered in the model! Business Volume (e.g. #users) KEY TAKEAWAY Traditional Capacity Planning techniques can severely overestimate the Business Capacity of Java Applications 4

Keeping your Capacity Predictions on Target even with Java Applications!

Java 101: Heap memory Server Memory Layout Free Memory Important Facts 1. The size of Java Heap Memory is fixed 2. When memory is exhausted, the Garbage Collection process kicks in and stops your application! Java Heap Memory Operating System and why should I care? Well, when the application stops, your Customers cannot shop. You re going to lose at least $3000 every second! KEY TAKEAWAY Exhaustion of Java Heap memory is one of the most common bottlenecks causing outages in Java applications 6

Is The Widely Used Java Heap Utilization A Good Metric for Capacity Planning? Heap Utilization (all app. servers) Heap Utilization Live Sessions 7

First challenge: finding the right metric Java Heap Utilization measures how much Heap memory is being used and is provided by most of Java monitoring solutions Java Heap Memory Heap Size Free Used Heap Memory Utilization % Heap Utilization is flat, irrespective of the workload increase # Users 8

What is Heap Utilization poor and How to come up with a Better Metric? Garbage Collection Events Heap Utilization Garbage Heap Utilization Live Data Size is the amount of memory consumed by the set of live lived objects required to run the application How about using the Live Data Size for capacity planning models? Time 9

The ultimate Java Memory KPI: Live data Java Heap Memory Free Garbage Heap Size xp Used Live Data e.g. Application memory footprint KEY TAKEAWAY Java Heap Utilization is a combination of live data and garbage. Live Data represents the real memory footprint of the application and is the correct KPI to use for capacity planning Mastering Java Applications Capacity - December 2015 #movinar 10

How to measure Live Data Most Java monitoring tools won t make Live Data available, however let s take a look at a Garbage Collection log file Example of garbage collection log (Oracle JVM w/ Concurrent Mark-Sweep) KEY TAKEAWAY Live Data can be derived from Garbage Collection logs 11

The Result of the Data Collection: Live Data Size looks Promising! Heap Utilization Live Data Size Live Sessions 12

The Final Test: Is Live Data Size Correlated with Live Sessions? YES! Live Data Size R-squared = 91% Live Sessions 13

Balance between cost and performance Wasted Capacity Conservative thresholds might lead to inefficient use of available capacity Performance Issues Aggressive thresholds might lead to to excessive GC Garbage Collections Heap Utilization Heap Utilization Live Data Threshold @ 80% Live Data Threshold @ 20% KEY TAKEAWAY Time A suggested threshold to start from is 50% of Heap (Old Gen) size Time 14

Putting together Java-aware and Business-centric From Java-aware Capacity Models To Business-centric Capacity Planning Live Data Utilization (bytes) New Estimated Business Capacity Current Infrastructure Current Users # App. Server Instances Required Infrastructure To Support Business Initiative Target Users Estimated # App. Server Instances 3500 45 4500 60 Business Volume (e.g. #users) 15

Detecting Poor Memory Usage Patterns and Anticipating Memory Leaks The model-based approach

A new Memory usage pattern emerged after a new Application release What is causing this? Live Data Size Live Sessions 17

Another Live Data Size Benefit: Anticipating Mem. Leaks Live Data Size Live Sessions Live Data Size High Mem Usage @ Low Load Live Sessions Based on this evidence, Devs investigated the app and found the actual memory leak. They later asked us to include this analysis as part of the release cycle 18

Efficiency: Are your CPUs used for the Business, or by the Garbage Collector? Stop the guessing and start measuring!

All of a Sudden, Something Really Weird Happened CPU Utilization CPU Utilization cut by 5x while doing the same amount of work! CPU Utilization Server Call Rate Server Call Rate No variation in business volumes, no new application release, no changes in physical infrastructure. The Change: +2 GB Java Heap! 20

GC CPU Utilization is not available in many Java monitoring tools. How can you measure it? Example of GC log fragment on Oracle JVM (--XX:+PrintGCDetails): Sum over the Interval % å CPUuser + CPUsys GarbageCollectorCPU = Interval x CPUNumber Eg. 300 secs (5 min) 21

After data collection: GC was the first consumer of CPU! CPU Utilization Almost all of the CPU cycles used by GC! Total CPU Utilization Garbage Collector CPU Util % After cluster expansion: Total CPU cut in half, GC CPU cut by 5x! The Garbage Collector might be the first consumer of your CPUs, well ahead the actual application code. Stop the guessing, start measuring it! 22

Scalability in 2015: Java Achille Heels? How to keep it under control!

Unexplained CPU Utilization Patterns During Memory Stressful Conditions CPU Utilization High CPU Utilization during the night, even though workload is zero after 9PM CPU Utilization Server Call Rate What drives CPU Utilization during the night? 24

Let s Find It Out! Linux top During The Anomaly Example of Linux top output, thread view (press H once in top) : One software thread consuming all of its CPU cycles? This is the background thread used by the GC! Example of Java Thread Dump (jstack <PID>) : 25

Can Java Garbage Collector Be A Scalability Bottleneck? Java Concurrent Mark and Sweep Garbage Collector (CMS) is concurrent and parallel ü Concurrent = perform work without stopping the application threads ü Parallel = it is multi-threaded, scales with number of CPUs But we discovered that: 1. Just one CMS Background thread is configured by default with up to 4 CPUs 1. Can be incresed via specific option, but watch out for excessive GC CPU Utilization 2. CMS might «fail» and be forced to single-threaded operation 3. Even best in class GCs still need to stop the application - Amdhal law applies! 26

Conclusions So What Have We Learned?

Key Take Aways What have we discovered? Traditional capacity models might severely overestimate the business capacity of Java applications The major consumer of your infrastructure resources might be the garbage collector Java memory management can have an impact on your application scalability Common monitoring tools might not provide all the metrics you need The key metrics to look for might not be reported by your typical toolset, but Monitoring/APM Tools might not Our contribution to close the gap An enhanced Capacity model takes into account Java memory and support what-if analyses, using innovative KPIs The need to get visibility into real garbage collection CPU utilization and how to gather it How to control the problem by keeping track of single-threaded problems Be sure to enable detailed GC logging an all your Java enterprise apps and integrate the KPIs in your CM solution! 28

Java Memory Stress Translates to poor Application Performance GC pause time (seconds) Application stopped for 66 seconds KEY TAKEAWAY Excessive GC stress might cause poor User Experience or even service failures you need to monitor it!

Questions?

Contacts Headquarters Via Schiaffino 11C 20158 Milan Italy T +39-024951-7001 USA East 283 Franklin Street Boston, MA 02110 T: +1-617-936-0212 USA West 425 Broadway Street Redwood City, CA 94063 T +1-650-226-4274 @moviri moviricorp moviri +moviri