PERFVIEW..NET runtime performance and ETW event analysis tool

Similar documents
DNWSH - Version: 2.3..NET Performance and Debugging Workshop

ProdDiagNode - Version: 1. Production Diagnostics for Node Applications

Glacier: A Garbage Collection Simulation System

SELF-AWARE APPLICATIONS AUTOMATIC PRODUCTION DIAGNOSIS DINA GOLDSHTEIN

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft.NET Framework Agent Fix Pack 13.

Zing Vision. Answering your toughest production Java performance questions

OS-caused Long JVM Pauses - Deep Dive and Solutions

J2EE Development Best Practices: Improving Code Quality

profiling Node.js applications

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Efficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel

6.828: OS/Language Co-design. Adam Belay

Root Cause Analysis for SAP HANA. June, 2015

Optimizing Your Android Applications

NGN Progress Report. Table of Contents

Java performance - not so scary after all

Batch Jobs Performance Testing

Garbage Collection. Steven R. Bagley

NUMA in High-Level Languages. Patrick Siegler Non-Uniform Memory Architectures Hasso-Plattner-Institut

The benefits and costs of writing a POSIX kernel in a high-level language

Inter-Process Communication and Synchronization of Processes, Threads and Tasks: Lesson-1: PROCESS

Java Concurrency in practice

Fiji VM Safety Critical Java

Java Performance Tuning and Optimization Student Guide

ClearSpeed Visual Profiler

Jackson Marusarz Software Technical Consulting Engineer

Need to Node: Profiling Node.js Applications

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Go Deep: Fixing Architectural Overheads of the Go Scheduler

Barrelfish Project ETH Zurich. Tracing and Visualisation

Using the Singularity Research Development Kit

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

New programming language introduced by Microsoft contained in its.net technology Uses many of the best features of C++, Java, Visual Basic, and other

Module Overview. CLR Initialization

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Hands-on Lab Session 9909 Introduction to Application Performance Management: Monitoring. Timothy Burris, Cloud Adoption & Technical Enablement

Software Exception Flow Analysis for WPF C# Asynchronous Programs Using Microsoft Visual Studio

Tracealyzer for VxWorks Overview and Getting Started Guide

1/16/17. Computer Programs and Processes. PHYSICAL: damage from either tampering, misuse, theft of equipment, or any combination of these

A new Mono GC. Paolo Molaro October 25, 2006

Method-Level Phase Behavior in Java Workloads

High-Level Language VMs

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Practical Lessons in Memory Analysis

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

Processes and Threads

Memory & Thread Debugger

Threads. Still Chapter 2 (Based on Silberchatz s text and Nachos Roadmap.) 3/9/2003 B.Ramamurthy 1

Product Guide. McAfee Performance Optimizer 2.2.0

Threads. CSE 410, Spring 2004 Computer Systems.

Lab 3-2: Exploring the Heap

CS 326: Operating Systems. Process Execution. Lecture 5

Using Intel VTune Amplifier XE and Inspector XE in.net environment

StackVsHeap SPL/2010 SPL/20

User Space Tracing in Small Footprint Devices. (How Low can You Go?)

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Diagnostics in Testing and Performance Engineering

.NET Memory. Dump Analysis. Version 2.0. Dmitry Vostokov Software Diagnostics Services

The Challenges of XDP Hardware Offload

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

Operating Systems. IV. Memory Management

LinuxCon North America 2016 Investigating System Performance for DevOps Using Kernel Tracing

Run-Time Environments/Garbage Collection

Don t Dump Thread Dumps. Ram Lakshmanan

Running Application Specific Kernel Code by a Just-In-Time Compiler. Ake Koomsin Yasushi Shinjo Department of Computer Science University of Tsukuba

Performance Sentry VM Provider Objects April 11, 2012

Java Garbage Collection. Carol McDonald Java Architect Sun Microsystems, Inc.

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

Chapter 4: Multithreaded

Software Speculative Multithreading for Java

CS533 Concepts of Operating Systems. Jonathan Walpole

Runtime Application Self-Protection (RASP) Performance Metrics

McAfee Performance Optimizer 2.1.0

Debugging and profiling in R

CS 261 Fall Mike Lam, Professor. Virtual Memory

Oracle Developer Studio Performance Analyzer

MODULE 1 JAVA PLATFORMS. Identifying Java Technology Product Groups

Lesson 2 Dissecting Memory Problems

WINDOWS EVENT TRACE LOGS. SANS DFIR Summit 2018 Nicole Ibrahim G-C Partners, LLC

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Java Performance: The Definitive Guide

How to keep capacity predictions on target and cut CPU usage by 5x

Short-term Memory for Self-collecting Mutators. Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg

Operating Systems. read the warning about the size of the project make sure you get the 6 th edition (or later) of the book

Magpie: Distributed request tracking for realistic performance modelling

POSIX Threads: a first step toward parallel programming. George Bosilca

C#.Net. Course Contents. Course contents VT BizTalk. No exam, but laborations

Intel VTune Amplifier XE

Score-P A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir

Copyright Notice SmartBear Software. All rights reserved.

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Memory management, part 3: outline

Monitoring Agent for Tomcat 6.4 Fix Pack 4. Reference IBM

Transcription:

PERFVIEW.NET runtime performance and ETW event analysis tool

OVERVIEW Formerly from Vance Morrison (.NET performance architect) Open-source Performance-analysis tool Can be used to investigate CPU and Memory-related issues Lightweight, no installation required Scriptable, configurable Automated reports, integration to build process Customizable via custom extensions Collecting data from custom performance counters

EVENT TRACING FOR WINDOWS Mechanism for tracing and logging events Built into the kernel of the OS (since Windows 2000) Allows for tracing and profiling in production environment Fast and efficient CLR publishes many events Garbage collection Lock contention Thread Start/Stop/Wait Exceptions JIT

WHAT CAN PERFVIEW DO FOR YOU? CPU investigation Which methods are using the CPU time Managed memory investigation Allocated objects Memory leaks Garbage collections Boxing Multi-threading Lock contention Blocked threads

WHERE TO GET PERFVIEW https://github.com/microsoft/perfview

LIVE DEMO

EXAMPLE: GC COLLECT EVENTS How often did carbage collection run? How long does a collection take? Which generations were collected? Why did collection run? Use following options during ETW data collection GC Only turns of all other providers, approx. 200MB/h of data GC Collect Only only garbage collections, approx. 200MB/day of data

EXAMPLE: HEAP ALLOCATIONS Which objects are being allocated? What is their size? Who allocates these objects? How often does boxing occur? Which objects are being boxed? Normal ETW data collection collects coarse data only (1 sample/100kb) Detailed data can be collected using options.net Alloc event fired every allocation, high performance impact (2x).NET SampAlloc smart sampling, suitable for production use (10-20%)

EXAMPLE: MEMORY LEAKS Memory leaks can be present in managed code too! Possible sources Delegates, events Static fields of classes Unused long-lived objects Which objects are currently allocated on the heap? Who owns a reference to them? Dump managed heap and examine collected data

EXAMPLE: LOCK CONTENTION Which locks are the most contended? Use Any Stacks view and look for Events: Microsoft-Windows-DotNETRuntime/Contention/Start Microsoft-Windows-DotNETRuntime/Contention/Stop

EXAMPLE: THREAD BLOCKING Threads may be blocked due to Waiting for I/O Waiting for Lock Use the Thread Times option when collecting data Inspect BLOCKED_TIME and CPU_TIME node

EXAMPLE: EXCEPTIONS Which exception were thrown? Events Microsoft-Windows-DotNETRuntime/Exception/Start Microsoft-Windows-DotNETRuntime/Exception/Stop Microsoft-Windows-DotNETRuntime/ExceptionCatch/Start Microsoft-Windows-DotNETRuntime/ExceptionCatch/Stop Use specialized exception stack viewer

CUSTOM EVENTS Simply derive from System.Diagnostics.Tracing.EventSource Then add *MyEvents@StacksEnabled=true to additional providers

CUSTOM REPORTS PerfView can generate solution by using command PerfView.exe CreateExtensionProject MyProjectName After creating custom command, invoke it using File->User command Or press Alt-U

AUTOMATING THE PROCESS PerfView can be called from command line Integration to the build process Automated data collection./perfview.exe /AcceptEULA /LogFile=customreport.txt usercommand PrimeReports.PostBuildPerfAnalysis "./Programs/EtlDemo/bin/Debug/EtlDemo.exe"

PRODUCTION MONITORING We can run PerfView in production environment to diagnose problems that occur only during production workload PerfView collect "/StopOnPerfCounter=Processor:% Processor Time:_Total>90 Circular buffer (default 500MB ~ few minutes of data) Collection stops when condition is reached (additional 10s of data are collected)

PRODUCTION MONITORING Other examples PerfView "/MonitorPerfCounter=Memory:Available MBytes:@10" collect Logs available memory every 10 seconds, PerfView/PerformanceCounterUpdate event PerfView collect "/StopOnRequestOverMSec:2000" /CollectMultiple:3 Stops on ASP.NET request that takes longer than 2000ms, repeats 3 times PerfView collect "/StopOnException:ApplicationException" /Process:MyService /ThreadTime Stops when exception of given type is thrown in the given process

PERFVIEW AND LINUX PerfView itself is just a stack viewer, it analyses set of samples containing Timestamp optional, defaults to 0 Metric optional, defaults to 1 Stack logically just a set of strings Since V1.9, PerfView can read textual representation of Linux perf tool output Recognized extension.data.txt and.trace.zip Simply collect data using the PerfCollect script and then transfer to windows machine for anlaysis