Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

Similar documents
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Side-channel Attack on HotSpot Heap Management. Xiaofeng Wu, Kun Suo, Yong Zhao, Jia Rao The University of Texas at Arlington

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

String Deduplication for Java-based Middleware in Virtualized Environments

Towards Parallel, Scalable VM Services

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Runtime Application Self-Protection (RASP) Performance Metrics

Trace-based JIT Compilation

Java Without the Jitter

Comparison of Garbage Collectors in Java Programming Language

Cross-Layer Memory Management for Managed Language Applications

Virtualizing JBoss Enterprise Middleware with Azul

OS-caused Long JVM Pauses - Deep Dive and Solutions

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

Optimising Multicore JVMs. Khaled Alnowaiser

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

How to keep capacity predictions on target and cut CPU usage by 5x

Chronicler: Lightweight Recording to Reproduce Field Failures

The Z Garbage Collector Low Latency GC for OpenJDK

CGO:U:Auto-tuning the HotSpot JVM

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

Cross-Layer Memory Management for Managed Language Applications

Java Performance Tuning

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services

Towards Garbage Collection Modeling

Preserving I/O Prioritization in Virtualized OSes

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation

IBM Daeja ViewONE Virtual Performance and Scalability

Cross-Layer Memory Management to Reduce DRAM Power Consumption

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

JDK 9/10/11 and Garbage Collection

Serverless Architecture Hochskalierbare Anwendungen ohne Server. Sascha Möllering, Solutions Architect

New Java performance developments: compilation and garbage collection

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Java On Steroids: Sun s High-Performance Java Implementation. History

Yak: A High-Performance Big-Data-Friendly Garbage Collector. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Sanazsadat Alamian Shan Lu

Exploiting FIFO Scheduler to Improve Parallel Garbage Collection Performance

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Application Management Webinar. Daniela Field

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Phosphor: Illuminating Dynamic. Data Flow in Commodity JVMs

Enhancing the Usage of the Shared Class Cache

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Assessing the Scalability of Garbage Collectors on Many Cores

An Analysis and Empirical Study of Container Networks

Dynamic Selection of Application-Specific Garbage Collectors

An Experimental Study of Rapidly Alternating Bottleneck in n-tier Applications

IBM Education Assistance for z/os V2R2

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

The Garbage-First Garbage Collector

Reference Object Processing in On-The-Fly Garbage Collection

RxNetty vs Tomcat Performance Results

Java Performance Tuning and Optimization Student Guide

Myths and Realities: The Performance Impact of Garbage Collection

The Z Garbage Collector An Introduction

Exploiting the Behavior of Generational Garbage Collector

JVM Memory Model and GC

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

Real Time: Understanding the Trade-offs Between Determinism and Throughput

WHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud.

Garbage Collection. Hwansoo Han

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Enabling Java-based VoIP backend platforms through JVM performance tuning

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

CIT 668: System Architecture. Amazon Web Services

Data Centers and Cloud Computing. Data Centers

On The Limits of Modeling Generational Garbage Collector Performance

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

The benefits and costs of writing a POSIX kernel in a high-level language

Scaling Indexer Clustering

Running class Timing on Java HotSpot VM, 1

SPECjAppServer2002 Statistics. Methodology. Agenda. Tuning Philosophy. More Hardware Tuning. Hardware Tuning.

TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS

Hierarchical PLABs, CLABs, TLABs in Hotspot

Looking ahead with IBM i. 10+ year roadmap

Fiji VM Safety Critical Java

Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization

Benchmarking/Profiling (In)sanity

Cross-layer Optimization for Virtual Machine Resource Management

Oracle WebLogic Server 12c on AWS. December 2018

Java Garbage Collector Performance Measurements

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

[This is not an article, chapter, of conference paper!]

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Next-Generation Cloud Platform

ZENworks 2017 Update 1 ZENworks Diagnostics and Probe Guide. July 2017

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

@joerg_schad Nightmares of a Container Orchestration System

Efficient and Viable Handling of Large Object Traces

AWS Lambda. 1.1 What is AWS Lambda?

Lesson 2 Dissecting Memory Problems

Coping with Immutable Data in a JVM for Embedded Real-Time Systems. Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat

ECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with FlashMap

Compute Node Linux (CNL) The Evolution of a Compute OS

Introducing Scalileo A Java Based Scaling Framework

Transcription:

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic Jia Rao: The University of Texas at Arlington Hang Huang, Song Wu: Huazhong University of Science and Technology ISMM 18 June @ Philadelphia, USA

Unused Resources in the Cloud Jelastic Cloud Usage (real data) Real data from Jelastic cloud provider between 2014 and 2017 More than 25 TBs of unused RAM in 2017 Most cloud providers charge for reserved resources Users are paying for resources that are not used! Cloud users are forced to overprovision memory requirements not known dynamic workloads 1

Unused Resources in the Cloud Jelastic Cloud Usage (real data) Real data from Jelastic cloud provider between 2014 and 2017 More than 25 TBs of unused RAM in 2017 Unused but paid! Most cloud providers charge for reserved resources Users are paying for resources that are not used! Cloud users are forced to overprovision memory requirements not known dynamic workloads 1

Pay-as-you-Go vs Pay-as-you-Use Pay-as-you-Go Pay-as-you-use Pay for statically-sized instances Pay for used resources 2

Pay-as-you-Use for JVM Applications Proof-of-concept experiment, 1 instance, one task processed after startup and then idle Reserved Used Committed 3

Pay-as-you-Use for JVM Applications Proof-of-concept experiment, 1 instance, one task processed after startup and then idle Reserved Used Committed Problem 1: The JVM does not release RAM even if it is not being used (commited)! Problem 2: Applications cannot scale beyond Max Heap limit! 3

Pay-as-you-Use for JVM Applications Proof-of-concept experiment, 1 instance, one task processed after startup and then idle Dynamic Vertical Scalability is a requirement for taking advantage of Pay-as-you-Use Reserved Used Committed Problem 1: The JVM does not release RAM even if it is not being used (commited)! Problem 2: Applications cannot scale beyond Max Heap limit! 3

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). 4

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). Goal 1: give memory back to the host engine when it is not being used 4

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). Goal 1: give memory back to the host engine when it is not being used Goal 2: allow the JVM to grow its memory beyond the limit defined at launch time 4

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). Goal 1: give memory back to the host engine when it is not being used Goal 2: allow the JVM to grow its memory beyond the limit defined at launch time Goal 3: negligible negative throughput or memory footprint impact 4

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). Goal 1: give memory back to the host engine when it is not being used Goal 2: allow the JVM to grow its memory beyond the limit defined at launch time Goal 3: negligible negative throughput or memory footprint impact Goal 4: negligible pause-time for scaling memory 4

Goal: vertical memory scalability for JVM apps Improve the way the JVM fits in the virtualization stack (system-vms and containers). Goal 1: give memory back to the host engine when it is not being used Goal 2: allow the JVM to grow its memory beyond the limit defined at launch time Goal 3: negligible negative throughput or memory footprint impact Goal 4: negligible pause-time for scaling memory Goal 5: no changes to the host engine/os 4

Can t we use JVM tuning and/or cloud management tools? No... 5

Can t we use JVM tuning and/or cloud management tools? No... Reason 1: it is not possible to force the JVM to release memory from the outside (even a Full GC won t do it for some collectors such as PS). 5

Can t we use JVM tuning and/or cloud management tools? No... Reason 1: it is not possible to force the JVM to release memory from the outside (even a Full GC won t do it for some collectors such as PS). Reason 2: Horizontal scaling does not work if suddenly you need more memory than what you have in a single instance. It also requires more infrastructure and sophisticated algorithms to manage multiple instances; 5

Can t we use JVM tuning and/or cloud management tools? No... Reason 1: it is not possible to force the JVM to release memory from the outside (even a Full GC won t do it for some collectors such as PS). Reason 2: Horizontal scaling does not work if suddenly you need more memory than what you have in a single instance. It also requires more infrastructure and sophisticated algorithms to manage multiple instances; Reason 3: Setting a very high memory limit for the JVM solves the lack of memory problem but worsens reason 1; 5

Can t we use JVM tuning and/or cloud management tools? No... Reason 1: it is not possible to force the JVM to release memory from the outside (even a Full GC won t do it for some collectors such as PS). Reason 2: Horizontal scaling does not work if suddenly you need more memory than what you have in a single instance. It also requires more infrastructure and sophisticated algorithms to manage multiple instances; Reason 3: Setting a very high memory limit for the JVM solves the lack of memory problem but worsens reason 1; Reason 4: Rebooting the JVM to adjust the memory limit takes a long time leading to service unavailability, which is prohibitive for many applications. 5

Dynamic Vertical Memory Scaling 2-step solution: 6

Dynamic Vertical Memory Scaling 2-step solution: Step 1: 1. dynamically increase or decrease the JVM memory limit (i.e. amount of memory available to the application) 2. allow the cloud user to change this limit (this can also be done programmatically) 6

Dynamic Vertical Memory Scaling 2-step solution: Step 1: 1. dynamically increase or decrease the JVM memory limit (i.e. amount of memory available to the application) 2. allow the cloud user to change this limit (this can also be done programmatically) Step 2: 1. JVM heap sizing strategy that sizes the heap according to the application s used memory 2. Even if no GC is triggered, the heap size should be checked 6

Step 1: Current Max Heap Size We introduce a new JVM variable: CurrentMaxHeapSize can be set at launch time or at runtime, no need to guess the heap size beforehand once set, the heap cannot grow beyond its value Max heap size can be set to a conservatively high value (only affects reserved memory not committed memory) 7

Step 2: Periodic Heap Resizing Checks if... unused heap memory is large (line 6) last GC was a long ago (line 8) do heap resize MaxOverCommittedMem and MinTimeBetweenGCs can be set at launch time or at runtime We do not implement a new heap sizing algorithm, the JVM already has advanced ergonomic policies we just determine when to run it 8

Execution Memory Usage log 9

Implementation Solution implemented in the OpenJDK 9 HotSpot JVM CurrentMaxHeapSize, MaxOverCommittedMem, and MinTimeBetweenGCs are runtime variables that can be set at JVM launch time or at runtime; Periodic heap sizing checks are integrated in the VM control thread loop (executed nearly every second); JVM allocation path and heap growing respects CurrentMaxHeapSize Two collectors supported: Garbage First, most advanced GC, the new by-default Parallel Scavenge, widely used parallel collector We reuse the ergonomics code already added into the GC to implement the heap sizing operation 10

Evaluation Compare: G1 vs VG1 (vertical G1); PS vs VPS (vertical PS) Benchmarks: DaCapo 9.12 and Tomcat web server (real workload) Host node: Intel(R) Core(TM) i7-5820k CPU @ 3.30GHz, 32GBs DDR4 of RAM, Linux 4.9 Host engine: Docker 17.12 Each JVM runs in an isolated container 11

DaCapo 9.12 Benchmarks Benchmark Description Iterations CMaxMem MaxOCMem MinTimeGCs avrora AVR microcontrollers 5 32 MB 16 MB 10 sec fop XSL-FO to PDF 200 512 MB 32 MB 10 sec h2 JDBCbench-like in-memory 5 1024 MB 256 MB 10 sec jython interprets the pybench 5 128 MB 32 MB 10 sec luindex lucene indexing 100 256 MB 32 MB 10 sec pmd searches code problems 10 256 MB 32 MB 10 sec sunflow ray tracing 5 128 MB 16 MB 10 sec tradebeans daytrader benchmark 5 512 MB 128 MB 10 sec xalan XML to HTML 5 64 MB 16 MB 10 sec 12

Memory Scalability - JVM Heap Size (MB) Avg Heap Size (MB) Lower is Better 13

Memory Scalability - JVM Heap Size (MB) Avg Heap Size (MB) High allocation rate lead to higher improvements Lower is Better 13

Memory Scalability - Container Mem Usage (MB) Avg Container Size (MB) Lower is Better 14

Memory Scalability - Container Mem Usage (MB) Avg Container Size (MB) High allocation rate lead to higher improvements Lower is Better 14

Execution Time (ms) Avg Exec Time (ms) Lower is Better 15

Execution Time (ms) Avg Exec Time (ms) Minor Throughput Overheads for most benchmarks Lower is Better 15

Throughput vs Memory Tradeoff 16

Throughput vs Memory Tradeoff Different benchmarks have different memory throughput tradeoffs! 16

High Max Heap Limit Memory Overhead (h2 benchmark) Container Mem Usage (MB) Max Heap Limit Multiplier (1x = 1GB) Lower is Better 17

High Max Heap Limit Memory Overhead (h2 benchmark) Container Mem Usage (MB) ~20MB Max Heap Limit Multiplier (1x = 1GB) Lower is Better 17

High Max Heap Limit Memory Overhead (h2 benchmark) Container Mem Usage (MB) ~20MB High Xmx is compensated by periodic heap resizing! Max Heap Limit Multiplier (1x = 1GB) Lower is Better 17

Real-world Scenario Experiment Tomcat web server with 4-16GBs (based on real Jelastic clients workloads) utilized mostly during the day; at night (8 hours) the server is mostly idle user sessions (which occupy most of the memory) timeout after 10 min monthly cost estimation using Amazon EC2 (Ohio datacenter) assuming one could change the instance resources on the fly 18

Real-world Scenario Experiment (mem utilization) Container Mem Usage (MB) Lower is Better Time (hours) 19

Real-world Scenario Experiment (mem utilization) Container Mem Usage (MB) Resources can be saved during idle time! Lower is Better Time (hours) 19

Real-world Scenario Experiment (cost) Approach During Day During Night Total Savings 4GB-JVM 11.53$ 34.00$ 23.01$ 4GB-VJVM 1.44$ 24.44$ 29.40% 8GB-JVM 23.01$ 69.04$ 46.03$ 8GB-VJVM 1.44$ 47.47$ 31.00% 16GB-JVM 46.03$ 138.00$ 92.06$ 16GB-VJVM 1.44$ 93.50$ 32.60% 32GB-JVM 92.06$ 276.00$ 184.12$ 32GB-VJVM 1.44$ 185.00$ 33.00% 20

Conclusion Vertical Memory Scalability is an enabler for the Pay-as-you-Use model It can be implemented in the JVM with negligible throughput cost very promising footprint reductions Implementation can be easily ported to other GCs JEPs: http://openjdk.java.net/jeps/8204089 http://openjdk.java.net/jeps/8204088 21

Conclusion Vertical Memory Scalability is an enabler for the Pay-as-you-Use model It can be implemented in the JVM with negligible throughput cost very promising footprint reductions Implementation can be easily ported to other GCs Code is working in production at Jelastic JEPs: http://openjdk.java.net/jeps/8204089 http://openjdk.java.net/jeps/8204088 Thank you for your time! Questions? Rodrigo Bruno email: rodrigo.bruno@tecnico.ulisboa.pt webpage: www.gsd.inesc-id.pt/~rbruno github: github.com/rodrigo-bruno 21