OS-caused Long JVM Pauses - Deep Dive and Solutions

Similar documents
OS-Caused Large JVM Pauses: Investigations and Solutions

MAXIMIZING PERFORMANCE OF CO-LOCATED APPLICATIONS: EXPERIENCES AND STRATEGIES WHEN CO-LOCATING MULTIPLE JAVA APPLICATIONS ON A SINGLE PLATFORM

Low Latency Java in the Real World

Java Without the Jitter

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Low latency & Mechanical Sympathy: Issues and solutions

Typical Issues with Middleware

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Runtime Application Self-Protection (RASP) Performance Metrics

Java Performance Tuning and Optimization Student Guide

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Using Transparent Compression to Improve SSD-based I/O Caches

Virtual Memory Primitives for User Programs

Fast Big Data Analytics with Spark on Tachyon

Java Application Performance Tuning for AMD EPYC Processors

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

Oracle WebCenter Portal Performance Tuning

Zing Vision. Answering your toughest production Java performance questions

Optimising Multicore JVMs. Khaled Alnowaiser

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

Webcenter Application Performance Tuning guide

How to keep capacity predictions on target and cut CPU usage by 5x

The Google File System

Diagnostics in Testing and Performance Engineering

Enabling Java in Latency Sensitive Environments

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Oracle JRockit. Performance Tuning Guide Release R28 E

New Java performance developments: compilation and garbage collection

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Attila Szegedi, Software

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

Lesson 2 Dissecting Memory Problems

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Java Performance Tuning

Section 9: Cache, Clock Algorithm, Banker s Algorithm and Demand Paging

Amazon Aurora Deep Dive

JDK 9/10/11 and Garbage Collection

White Paper. Major Performance Tuning Considerations for Weblogic Server

Caching and reliability

A Quest for Predictable Latency Adventures in Java Concurrency. Martin Thompson

A new Mono GC. Paolo Molaro October 25, 2006

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

Myths and Realities: The Performance Impact of Garbage Collection

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

CA485 Ray Walshe Google File System

G1 Garbage Collector Details and Tuning. Simone Bordet

Monitoring Linux Performance for the SQL Server Admin. Anthony Nocentino, Enterprise Architect, Centino Systems

Optimizing Flash-based Key-value Cache Systems

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Tuning Intelligent Data Lake Performance

Amazon Aurora Deep Dive

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

Application Fault Tolerance Using Continuous Checkpoint/Restart

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

WHITEPAPER. Improve PostgreSQL Performance with Memblaze PBlaze SSD

Java Performance: The Definitive Guide

Contiguous memory allocation in Linux user-space

The Z Garbage Collector Low Latency GC for OpenJDK

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

The Z Garbage Collector An Introduction

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Top Ten Enterprise Java performance problems. Vincent Partington Xebia

Sybase Adaptive Server Enterprise on Linux

SSD Telco/MSO Case Studies

Virtual Memory Outline

Hierarchical PLABs, CLABs, TLABs in Hotspot

The Btrfs Filesystem. Chris Mason

About Terracotta Ehcache. Version 10.1

Monitoring Linux Performance for the SQL Server Admin. Anthony E. Nocentino, Enterprise Architect, Centino Systems

ibench: Quantifying Interference in Datacenter Applications

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Enabling Java-based VoIP backend platforms through JVM performance tuning

Virtualization and memory hierarchy

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Real Time: Understanding the Trade-offs Between Determinism and Throughput

Don t stack your Log on my Log

SAP ENTERPRISE PORTAL. Scalability Study - Windows

Scaling the OpenJDK. Claes Redestad Java SE Performance Team Oracle. Copyright 2017, Oracle and/or its afliates. All rights reserved.

Conoscere e ottimizzare l'i/o su Linux. Andrea Righi -

OPERATING SYSTEM. Chapter 9: Virtual Memory

Toward SLO Complying SSDs Through OPS Isolation

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

A Scalable Event Dispatching Library for Linux Network Servers

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

JVM Memory Model and GC

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Transcription:

OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com

2016-4-21

Outline q Introduction q Background q Scenario 1: startup state q Scenario 2: steady state with memory pressure q Scenario 3: steady state with heavy IO q Lessons learned 3

q Java + Linux Introduction Java is popular in production deployments Linux features interact with JVM operations Unique challenges caused by concurrent applications q Long JVM pauses caused by Linux OS Production issues, in three scenarios Root causes Solutions q References Ensuring High-performance of Mission-critical Java Applications in Multitenant Cloud Platforms, IEEE Cloud 2014 Eliminating Large JVM GC Pauses Caused by Background IO Traffic, LinkedIn Engineering Blog, 2016 (Too many tweets bringing down a twitter server! :) 4

Background q JVM and Heap Oracle HotSpot JVM q Garbage collection Generations Garbage collectors q Linux OS Paging (Regular page, Huge page) Swapping (Anonymous memory) Page cache writeback (Batched, Periodic) 5

q Three scenarios Scenarios Startup state Steady state with memory pressure Steady state with heavy IO q Workload Java application keeps allocating/de-allocating objects Background applications taking memories or issuing disk IO q Performance metrics Application throughput (K allocations/sec) Java GC pauses 6

Scenario 1: Startup State (App. Symptoms) q When Java applications start q Life is good in the beginning q Then Java throughput drops sharply q Java GC pauses spike during the same period 7

Scenario 1: Startup State (Investigations) Java heap is gradually allocated Without enough memory, direct page scanning can happen Heap is swapped out and in It causes large GC 8

Solutions q Pre-allocating JVM heap spaces JVM -XX:AlwaysPreTouch q Protecting JVM heap spaces from being swapped out Swappoff command Swappiness =0 for kernel version before 2.6.32-303 =1 for kernel version from 2.6.32-303 Cgroup 9

10 Evaluations (Pre-allocating Heap)

Evaluations (Protecting Heap) 18 24 11

Scenario 2: Steady State (App. Symptoms) q During steady state of a Java application, system memory stresses due to other applications q Java throughput drops sharply and performs badly q Java GC pauses spike 12

Scenario 2: Steady State (Level-1 Investigations) q During GC pauses, swapping activities persist q Swapping in JVM pages causes GC pauses q However, swapping is not enough Excessive GC pauses (i.e., 55 seconds) High sys-cpu usage (swapping is not sys-cpu intensive) [Times: user=0.12 sys=54.67, real=54.83 secs] 13

Scenario 2: Steady State (Level-2 Investigations) Transparent Huge Pages (THP) 2MB Splitting Collapsing Regular Pages 4KB 4KB 4KB 4KB 4KB 4KB q THP (Transparent Huge Pages) Improved TLB cache-hits q Bi-directional operations THPs are allocated first, but split during memory pressure Regular pages are collapsed to make THPs CPU heavy, and thrashing! 14

Solutions q Dynamically adjusting THP Enable THP when no memory pressure Disable THP during memory pressure period Fine tuning of THP parameters 15

Evaluations (Dynamic THP) q Without memory pressure Dynamic THP delivers similar performance as THP is on Mechanism THP Off THP On Dynamic THP Throughput (K allocations/sec) 12 15 15 q With memory pressure Dynamic THP has some performance overhead Performance is less than THP-off But better than THP-on Mechanism THP Off THP On Dynamic THP Throughput (K allocations/sec) 13 11 12 16

Scenario 3: Steady State (Heavy IO) q Production issue Online products Applications have light workload Both CMS and G1 garbage collectors q Preliminary investigations Examined many layers/metrics The only suspect: disk IO occasionally is heavy But all application IO are asynchronous 17

Reproducing the problem q Workload Simplified to avoid complex business logic https://github.com/zhenyun/javagcworkload q Background IO Saturating HDD 18

Case I: Without background IO No single longer-than-200ms pause 19

Case II: With background IO Huge pause! 20

21 Investigations

Time lines q At time 35.04 (line 2), a young GC starts and takes 0.12 seconds to complete. q The young GC finishes at time 35.16 and JVM tries to output the young GC statistics to gc log file by issuing a write() system call (line 4). q The write() call finishes at time 36.64 after being blocked for 1.47 seconds (line 5) q When write() call returns to JVM, JVM records at time 36.64 this STW pause of 1.59 seconds (i.e., 0.12 + 1.47) (line 3). 22

23 Interaction between JVM and OS

Non-blocking IO can be blocked q Stable page write For file-backed writing, OS writes to page cache first OS has write-back mechanism to persist dirty pages If a page is under write-back, the page is locked q Journal committing Journals are generated for journaling file system When appending GC log files needs new blocks, journals need to be committed Commitment might need to wait 24

Background IO activities q OS activity such as swapping Data writing to underlying disks q Administration and housekeeping software System-level software such as CFEngine also perform disk IO q Other co-located applications Co-located applications that share the disk drives, then other applications contend on IO q IO of the same JVM instance The particular JVM instance may use disk IO in ways other than GC logging 25

Solutions q Enhancing JVM Another thread Exposing JVM flags q Reducing IO activities OS, other apps, same app q Latency sensitive applications Separate disk High performing disks such as SSD Tmpfs 26

Evaluation q SSD as the disk 27

The good, the bad, and the ugly q The good: low real time Low user time and low sys time [user=0.18 sys=0.01, real=0.04 secs] q The bad: non-low (but not high) real time High user time and low sys time [user=8.00 sys=0.02, real=0.50 secs] q The ugly: high real time High sys time [user=0.02 sys=1.20, real=1.20 secs] Low sys time, low user time [Example? ] 28

Lessons Learned (I) q Be cautious about Linux s (and other OS) new features Constantly incorporating new features to optimize performance Some features incur performance tradeoff They may backfire in certain scenarios 29

Lessons Learned (II) q Root causes can come from seemingly insignificant information Linux emits significant amount of performance information Most of us most of the time mostly only examine a small subset of them Don t ignore others understand the interactions of sub-components 30

Lessons Learned (III) q Pay attention to multi-layer interaction Application protocol, JVM, OS, storage/networking Most people are familiar with a few layers Optimizations done at one layer may adversely affect other layers Many performance problems are caused by the cross-layer interactions 31

Zhenyun@gmail.com