SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN

Similar documents
DNWSH - Version: 2.3..NET Performance and Debugging Workshop

CS 241 Honors Memory

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Diagnostics in Testing and Performance Engineering

Erlang in the battlefield. Łukasz Kubica Telco BSS R&D Department Cracow Erlang Factory Lite, 2013

PERFVIEW..NET runtime performance and ETW event analysis tool

Need to Node: Profiling Node.js Applications

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Memory Management. Dr. Yingwu Zhu

THE TROUBLE WITH MEMORY

Using Automated Network Management at Fiserv. June 2012

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

Optimizing Dynamic Memory Management

IBM Monitoring and Diagnostic Tools for Java TM...

Automating Problem Analysis and Triage. Sasha

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

CS510 Operating System Foundations. Jonathan Walpole

ProdDiagNode - Version: 1. Production Diagnostics for Node Applications

Product Guide. McAfee Performance Optimizer 2.2.0

PERFORMANCE. Rene Damm Kim Steen Riber COPYRIGHT UNITY TECHNOLOGIES

Processes, Address Spaces, and Memory Management. CS449 Spring 2016

Performance Testing WITH

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Robust Memory Management Schemes

Release Notes for Patches for the MapR Release

Structure of Programming Languages Lecture 10

Java Garbage Collection. Carol McDonald Java Architect Sun Microsystems, Inc.

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Memory: Overview. CS439: Principles of Computer Systems February 26, 2018

Lecture #10 Context Switching & Performance Optimization

Java Performance Tuning From A Garbage Collection Perspective. Nagendra Nagarajayya MDE

Zing Vision. Answering your toughest production Java performance questions

McAfee Performance Optimizer 2.1.0

What s New in Platform Builder 7

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft.NET Framework Agent Fix Pack 13.

Profiling & Optimization

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Massif-Visualizer. Memory Profiling UI. Milian Wolff Desktop Summit

Recitation #12 Malloc Lab - Part 2. November 14th, 2017

Java performance - not so scary after all

Analyzing Real-Time Systems

Run-Time Environments/Garbage Collection

Google File System. Arun Sundaram Operating Systems

Interoperation of tasks

Advanced Programming & C++ Language

Manjunath Subburathinam Sterling L2 Apps Support 11 Feb Lessons Learned. Peak Season IBM Corporation

Lesson 2 Dissecting Memory Problems

Dynamic Memory Management

Chapter 2: Operating-System Structures

Fast Big Data Analytics with Spark on Tachyon

ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA. 18 October 2017

Solving Problems in Ways Never Before Possible, with FusionReactor 7

Lecture Notes on Garbage Collection

Inside the Garbage Collector

JVM Memory Model and GC

Exploiting the Behavior of Generational Garbage Collector

RAM, Design Space, Examples. January Winter Term 2008/09 Gerd Liefländer Universität Karlsruhe (TH), System Architecture Group

Memory & Thread Debugger

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

CS61 Lecture II: Data Representation! with Ruth Fong, Stephen Turban and Evan Gastman! Abstract Machines vs. Real Machines!

High-Level Language VMs

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Memory management. Knut Omang Ifi/Oracle 10 Oct, 2012

Typical Issues with Middleware

Don t stack your Log on my Log

Monitoring Agent for Tomcat 6.4 Fix Pack 4. Reference IBM

Windows Kernel Internals II Overview University of Tokyo July 2004*

Improving the Productivity of Scalable Application Development with TotalView May 18th, 2010

Switching to Innodb from MyISAM. Matt Yonkovit Percona

Application Management Webinar. Daniela Field

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

OpenManage Power Center Demo Guide for

File Systems: Fundamentals

Jackson Marusarz Software Technical Consulting Engineer

The benefits and costs of writing a POSIX kernel in a high-level language

File Systems: Fundamentals

Tuesday 6th October Agenda

Encyclopedia of Crash Dump Analysis Patterns Second Edition

DLL Injection A DA M F U R M A N EK KON TA MF URMANEK. PL HT T P :/ /BLOG. A DAMF URM ANEK.PL

A new Mono GC. Paolo Molaro October 25, 2006

Garbage Collection. Steven R. Bagley

Operating System Services

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

Deadlock. Pritee Parwekar

Automatic Memory Management

Encyclopedia of Crash Dump Analysis Patterns

Open Programmable Architecture

Garantierte Serviceverfügbarkeit in einer hybriden IT

MultiThreading. Object Orientated Programming in Java. Benjamin Kenwright

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Implementation Garbage Collection

SQL Diagnostic Manager Management Pack for Microsoft System Center

Don t Dump Thread Dumps. Ram Lakshmanan

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Faster Programs with Guile 3

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Transcription:

SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN

Agenda Motivation Hierarchy of self-monitoring CPU profiling GC monitoring Heap analysis Deadlock detection 2

Agenda Motivation Hierarchy of self-monitoring CPU profiling GC monitoring Heap analysis Deadlock detection Not today: profiling tools, 3rd party monitoring, dashboards 3

Motivation Why monitor? Obviously a must for servers and/or anything at scale 4

Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for simple shrink-wrap software 5

Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for simple shrink-wrap software Why our own? 6

Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for simple shrink-wrap software Why our own? Customized for specific business needs 7

Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for simple shrink-wrap software Why our own? Customized for specific business needs Diagnostics flow from ground up 8

Motivation Why monitor? Obviously a must for servers and/or anything at scale But also a must for simple shrink-wrap software Why our own? Customized for specific business needs Diagnostics flow from ground up That s what all the cool kids do :-) 9

Master Plan - Use Hierarchy! 10

Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs 11

Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs Medium for less frequency events Rare exceptions, deadlocks ETW, ClrMD 12

Master Plan - Use Hierarchy! Lightweight for continuous and frequent monitoring of all basics Numerical resource consumption: CPU, memory, disk Performance counters, Win32 APIs Medium for less frequency events Rare exceptions, deadlocks ETW, ClrMD Invasive for deep-dive and concrete diagnostics Memory leaks, bulk call-stack data (e.g. CPU profiling) CLR Profiling API, CLR Debugging API, hooks 13

LET S GET TO BUSINESS

(Self) CPU-Profiling Monitor CPU using performance counters Are we above a certain threshold for a certain amount of time? Turn on ETW and collect stacks (live using LiveStacks) Find hot paths, produce flame graphs Suggest recommendations 15

What Can Be Done? AuthenticationController takes 95% CPU, maybe we're being DDoS'ed Image processing component takes 100% CPU, need to auto-scale the app Encoding this 30 second video takes 3 minutes at 100% CPU, tell the user she can send us a bug report 16

DEMO MONITOR FOR CPU SPIKES

This Is From Real Life 18

(Self) GC-Monitoring Monitor GC performance using performance counters Register on ETW s GC events such as GCAllocationTick Types of objects allocated and their stacks(!) Number of GCs of each kind and size of reclaimed memory Duration of GC pauses Attach ClrMD to get heap breakdown Generations, segments, reserved/committed, number of objects 19

DEMO MONITOR FOR ALLOCATION SPIKES

(Self) Heap Analysis Monitor memory usage using performance counters Has memory increased above a certain threshold for a certain time? Attach ClrMD to get heap statistics Compare snapshots Report which objects are not freed Combine with ETW GCAllocationTick to get rate of allocation 21

DEMO FIND A LEAK

(Self) Deadlock Detection Monitor for potential deadlock Low CPU Request timeouts Increased thread count Attach ClrMD to create wait chains and detect deadlocks Report, try to break, pray for a miracle 23

DEMO DETECT A DEADLOCK

Food for Thought Many more scenarios are possible 25

Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed 26

Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed Native memory leak analysis using ETW 27

Food for Thought Many more scenarios are possible Monitor heap fragmentation and compact large objects if needed Native memory leak analysis using ETW Side notes: ClrMD is also very suitable for automating crash dump analysis You can automate opening tickets in bug tracker, consolidate same issue from different users, versions, etc. 28

Not Everything Is Perfect The pros are obvious (visibility, easy scaling ) 29

Not Everything Is Perfect The pros are obvious (visibility, easy scaling ) But there are some cons as well 30

Not Everything Is Perfect The pros are obvious (visibility, easy scaling ) But there are some cons as well Adds complexity (reduce risk by using separate process) 31

Not Everything Is Perfect The pros are obvious (visibility, easy scaling ) But there are some cons as well Adds complexity (reduce risk by using separate process) Adds overhead 32

Not Everything Is Perfect The pros are obvious (visibility, easy scaling ) But there are some cons as well Adds complexity (reduce risk by using separate process) Adds overhead Requires additional development 33

Summary Self-monitoring is important for all kinds of software Best to create a hierarchy of monitoring (and overhead and complexity) Lots of scenarios: CPU, GC, memory, deadlocks Demos: https://github.com/dinazil/self-aware-applications 34

THANK YOU DINA GOLDSHTEIN @DINAGOZIL