ADVANCED trouble-shooting of real-time systems. Bernd Hufmann, Ericsson

Similar documents
Adventures In Real-Time Performance Tuning, Part 2

Testing real-time Linux: What to test and how.

Measuring the impacts of the Preempt-RT patch

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Reservation-Based Scheduling for IRQ Threads

System Wide Tracing User Need

Real-Time Technology in Linux

Linux - Not real-time!

Distributed traces modeling and critical path analysis

NuttX Realtime Programming

Simplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools

How Linux RT_PREEMPT Works

Inferring Temporal Behaviours Through Kernel Tracing

Tracing embedded heterogeneous systems

Development of Real-Time Systems with Embedded Linux. Brandon Shibley Senior Solutions Architect Toradex Inc.

The Real Time Thing. What the hack is real time and what to do with it. 22C3 30. December Erwin Erkinger e.at

Real Time Linux patches: history and usage

Case Study: Challenges and Benefits in Integrating Real Time patch in PowerPc Based Media System

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

LinuxCon North America 2016 Investigating System Performance for DevOps Using Kernel Tracing

Evaluation of uclinux and PREEMPT_RT for Machine Control System

QUESTION BANK UNIT I

A Methodology for Interrupt Analysis in Virtual Platforms

Notes based on prof. Morris's lecture on scheduling (6.824, fall'02).

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Evaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved.

Asynchronous Events on Linux

hrtimers and beyond transformation of the Linux time(r) system OLS 2006

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

Timers 1 / 46. Jiffies. Potent and Evil Magic

LINUX INTERNALS & NETWORKING Weekend Workshop

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Software Debugging and Monitoring for Heterogeneous Many-Core Telecom Systems Overview of Research Program

8th Slide Set Operating Systems

The control of I/O devices is a major concern for OS designers

TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems

Real-Time Operating Systems Design and Implementation. LS 12, TU Dortmund

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

CS3733: Operating Systems

Operating Systems, Fall

Embedded Systems Programming

LECTURE 3:CPU SCHEDULING

Linux Foundation Collaboration Summit 2010

April 4-7, 2016 Silicon Valley

Benchmarking Real-Time and Embedded CORBA ORBs

Decoupling Cores, Kernels and Operating Systems

Yocto Project components

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, M A Y

I/O Systems and Storage Devices

Streaming mode snapshot mode Faster Troubleshooting Higher Quality Better Performance Control System Tuning Other Benefits

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #10: More on Scheduling and Introduction of Real-Time OS

Tasks. Task Implementation and management

CS 326: Operating Systems. CPU Scheduling. Lecture 6

( D ) 4. Which is not able to solve the race condition? (A) Test and Set Lock (B) Semaphore (C) Monitor (D) Shared memory

Implementation and Evaluation of the Synchronization Protocol Immediate Priority Ceiling in PREEMPT-RT Linux

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

Four Components of a Computer System. Operating System Concepts 7 th Edition, Jan 12, 2005

Process Description and Control

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

LOCKDEP, AN INSIDE OUT PERSPECTIVE. Nahim El

OVERVIEW. Last Week: But if frequency of high priority task increases temporarily, system may encounter overload: Today: Slide 1. Slide 3.

Real Time: Understanding the Trade-offs Between Determinism and Throughput

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT I OPERATING SYSTEMS

Introduction to Embedded Systems

Subject: Operating System (BTCOC403) Class: S.Y.B.Tech. (Computer Engineering)

System and Application Analysis with LTTng

Process Description and Control. Chapter 3

Oracle Developer Studio Performance Analyzer

Microkernel/OS and Real-Time Scheduling

Module 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control.

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

CS350: Final Exam Review

MaRTE OS: Overview and Linux Version

Embedded Systems: OS. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Performance analysis basics

2 nd Half. Memory management Disk management Network and Security Virtual machine

Operating System Design Issues. I/O Management

Real Time Operating Systems and Middleware

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Lecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

CS370 Operating Systems

Embedded Linux Conference 2010

Visualizing the Runtime World of Embedded Software Benefits and Examples. Dr. Johan Kraft, CEO/Founder Percepio AB

CSE325 Principles of Operating Systems. Processes. David P. Duggan February 1, 2011

Scheduling Mar. 19, 2018

Deterministic Futexes Revisited

Virtual Memory COMPSCI 386

e-ale-rt-apps Building Real-Time Applications for Linux Version c CC-BY SA4

How to get realistic C-states latency and residency? Vincent Guittot

CPSC 341 OS & Networks. Introduction. Dr. Yingwu Zhu

Interrupt/Timer/DMA 1

Systems Programming and Computer Architecture ( ) Timothy Roscoe

Chapter 2 Computer-System Structure

Preempt-RT Raspberry Linux. VMware Tiejun Chen

Transcription:

ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson

AGENDA 1 Introduction 2 3 Timing Analysis 4 References 5 Q&A Trace Compass Overview ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 2

Trace Compass Overview Troubleshooting tool Framework to build trace visualization and analysis tools Scalable: handle traces exceeding memory Extensible for any trace or log format: Binary, text, XML etc. Reusable views and widgets Available as standalone product or set of plug-ins ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 3

Trace Compass Overview ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 4

Data Flow ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 5

Data Flow ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 6

COMMON Features Events Table ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 7

COMMON Features Searching Filtering Highlighting ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 8

COMMON Features Trace annotation (bookmarks) and markers ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 9

StateFUL Analyses ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 10

XML Analysis & views Pattern analysis Find a sequence of data within a trace Customize Trace Compass without adding code Generate state systems Do timing analysis Define specialized views ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 11

Call stack View Extensible view to display of call stacks over time LTTng-UST and finstrument-functions of GCC ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 12

Trace Correlation Trace Compass can open multiple traces together to view it as one This is called an Experiment Useful for Traces coming from multiple nodes Traces from applications written in different languages Different layers (network, etc.) Traces can be synchronized by time Manually Automatic algorithm (extensible) ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 13

Built-in Trace Types Linux Tracing Toolkit - LTTng (UST, Kernel) Text & XML Logs (custom parsers) Common Trace Format CTF application, kernel, HW, bare metal, etc. Packet Capture Best Trace Format - BTF GDB Trace Points ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 14

TIMING ANALYSIS Real-time systems We have two metrics to analyse what is the data and when did it come Timing is as important as data Measure time between a start and end state Simple: Start and end event Often: State Machine to determine start and end Represent execution times, latencies, latency chains etc. ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 15

TIMING ANALYSIS Locate timing problems Missed deadlines Potential missed deadline (find problem before it occurs) Analyze timing problems Find root cause and solution Solve difficult to debug sporadic problems ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 16

EXAMPLE Soft IRQ Latency Latency 1 Latency 2 softirq_raise softirq_raise Softirq_handler_ent softirq_handler_entry ry softirq_handler_exit Parameter: CPU ID, IRQ # Total Latency ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 17

Generalization Time between start and end Time for each transition Percentage sub-duration vs total ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 18

Your Timing Analysis Define a state machine for timing analysis Implementation in Java as Trace Compass extension Data-driven pattern matching (in XML) Defining timing analyses on-the-fly Store in a built-in segment store Visualize data in various supplied views ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 19

Visualization Table Get raw data Explore data Sorting, highlighting, filtering Scatter Chart Latency vs Time Have a big picture of the current range ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 20

Visualization Distribution Chart Find outliers and modes easily Statistics Min, max, average etc. Find worst offenders Find worst possible offender combination ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 21

TIMING ANALYSIS Locate timing problems Missed deadlines Potential missed deadline (find problem before it occurs) Analyze timing problems Find root cause and solution Solve difficult to debug sporadic problems ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 22

Example Root Causes System overload System misconfiguration (e.g. wrong priorities of tasks) Priority inversion Lower priority task is blocking higher priority task (indirectly) Blocked threads, starvation, deadlock Slow code ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 23

Resources View Displays resources states (color-coded) over time CPUs, IRQs, SoftIRQs ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 24

Critical Path Displays of system wait chains for given process ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 25

PRIORITY VIEW Group processes per CPU and priority Quickly find priority inversion or misconfigured task priorities Note: View not mainlined yet Prototype! ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 26

FUTEX analysis Find contention at the Kernel level using LTTng Realized as XML pattern analysis Count of simultaneous waits Show all in timing analysis views Uaddr vs Thread Gantt chart ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 27

OS Tracing Overview Overloaded resources CPU, Memory and IO Usage Counter-intuitive example, CPU usage too low: Kernel memory usage is rising Find the offending process IO usage is high Maybe it s swaps Too many seeks? Low IO, low CPU, low memory usage and low bandwidth ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 28

FLAME GRAPH View Aggregation of function durations per call stack Highlights most time consuming execution path Find functions for performance optimization ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 29

Future Development User-configurable periodic markers Custom charts Enhanced call graph analysis and views Call stack views using data-driven analysis Pin & clone of views Time based import of traces/experiments Scalable segment store Enhanced searching, filtering and highlighting in Gantt charts Data-driven analysis and view enhancements Cropping of traces Priority view ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 30

REFERENCES Project pages http://tracecompass.org http://projects.eclipse.org/projects/tools.tracecompass Documentation Trace Compass User Guide Trace Compass Developer Guide ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 31

REFERENCES Linux Tracing Toolkit (LTTng) http://lttng.org/ Diagnostic and Monitoring Working Group http://diamon.org/ Common Trace Format (CTF) http://diamon.org/ctf/ Trace Research Project http://hsdm.dorsal.polymtl.ca/ ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 32

CONTACTS Bernd.Hufmann@ericsson.com Mailing list IRC tracecompass-dev@eclipse.org oftc.net #tracecompass Mattermost https://mattermost-test.eclipse.org/eclipse/channels/trace-compass ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 33

ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 34

Custom Parsers Custom Text and XML Parsers Line based parser with regex XML based extracting data from XML elements and their attributes ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 37

EXAMPLE High Resolution Timer cyclictest application of rt-tests Latency between timer expiry till task starts Event: 1 2 3 4 5 1 Δ1 Δ2 Δ3 Δ4 task Latency = Δ1+ Δ2 + Δ3 + Δ4 ADVANCED trouble-shooting of critical real-time systems Ericsson AB 2017 2017-02-21 Page 39 Event 1: Timer expires Event 2: Interrupt begins executing Event 3: Interrupt handler marks the task to react Event 4: Linux scheduler switches to the task Event 5: Application task begins executing