XDS560 Trace. Advanced Use Cases for Profiling. Daniel Rinkes Texas Instruments

Similar documents
XDS560 Trace. Technology Showcase. Daniel Rinkes Texas Instruments

April 4, 2001: Debugging Your C24x DSP Design Using Code Composer Studio Real-Time Monitor

SMT107 User Manual User Manual (QCF42); Version 3.0, 8/11/00; Sundance Multiprocessor Technology Ltd. 1999

file://c:\documents and Settings\degrysep\Local Settings\Temp\~hh607E.htm

Announcement. Exercise #2 will be out today. Due date is next Monday

XDS200 ISO Operating Guide

Introduction to Parallel Performance Engineering

First-In-First-Out (FIFO) Algorithm

Page Replacement Algorithms

Computer Organization: A Programmer's Perspective

Using ARM ETB with TI CCS. CCS 3.3 with SR9 on TMS320DM6446

Flash: an efficient and portable web server

CSCI-GA Operating Systems I/O. Hubertus Franke

ARM Cortex-M and RTOSs Are Meant for Each Other

CS370 Operating Systems

Olimex Field Update Kit

XDS560v2 LC Traveler JTAG Emulator Technical Reference

Four Components of a Computer System. Operating System Concepts 7 th Edition, Jan 12, 2005

XDS560V2 Installation Guide

CPSC 341 OS & Networks. Introduction. Dr. Yingwu Zhu

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

Performance Profiling

CS510 Operating System Foundations. Jonathan Walpole

Module 12: I/O Systems

Buffer Overflow Defenses

ADVANCED trouble-shooting of real-time systems. Bernd Hufmann, Ericsson

Chapter 2: Operating-System Structures. Operating System Concepts 8 th Edition

CS370 Operating Systems

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

I/O Systems. Jo, Heeseung

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

(In columns, of course.)

PSIM Tutorial. How to Use SCI for Real-Time Monitoring in F2833x Target. February Powersim Inc.

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

4 DEBUGGING. In This Chapter. Figure 2-0. Table 2-0. Listing 2-0.

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

EB-51 Low-Cost Emulator

Lesson 2: Using the Performance Console

G Programming Languages - Fall 2012

PCAN-Explorer 6. Tel: Professional Windows Software to Communicate with CAN and CAN FD Busses. Software >> PC Software

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE


Devices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices

Operating System Control Structures

TDS510USB-C2K Emulator Installation Guide

RTOS Debugger for ChibiOS/RT

Processes. Sanzheng Qiao. December, Department of Computing and Software

Chapter 8: Virtual Memory. Operating System Concepts

Performance analysis basics

Device-Functionality Progression

Chapter 12: I/O Systems. I/O Hardware

Lecture 15: I/O Devices & Drivers

OPERATING SYSTEMS CS136

RTOS Debugger for ThreadX

UAD2 + Universal Access Device2 plus

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

CS330: Operating System and Lab. (Spring 2006) I/O Systems

Operating System Services

Major Requirements of an OS

PC-based data acquisition I

Chapter 3: Processes. Operating System Concepts 8 th Edition,

7/20/2008. What Operating Systems Do Computer-System Organization

Process Description and Control. Chapter 3

Chapter 1: Introduction. Operating System Concepts 8 th Edition,

Processes and Threads

Lab 3-3: Scenario - Fixing a Memory Leak

The control of I/O devices is a major concern for OS designers

I/O Systems. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Renesas 78K/78K0R/RL78 Family In-Circuit Emulation

Contents. Cortex M On-Chip Emulation. Technical Notes V

DRIVER STATION v1.0 UTILITY LOADER Created: 22DEC2008 FIRST DRIVER STATION UTILITY LOADER RE-IMAGE INSTRUCTIONS

Processes. Process Scheduling, Process Synchronization, and Deadlock will be discussed further in Chapters 5, 6, and 7, respectively.

Chapter 1: Introduction

Process and Its Image An operating system executes a variety of programs: A program that browses the Web A program that serves Web requests

ELEC 377 Operating Systems. Week 1 Class 2

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

CS 134. Operating Systems. April 8, 2013 Lecture 20. Input/Output. Instructor: Neil Rhodes. Monday, April 7, 14

Chapter 13: I/O Systems

To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

CS 333 Introduction to Operating Systems. Class 14 Page Replacement. Jonathan Walpole Computer Science Portland State University

Chapter 13: I/O Systems

Silberschatz and Galvin Chapter 12

CS 111. Operating Systems Peter Reiher

Chapter 13: I/O Systems

RTOS Debugger for RTX-ARM

KeyStone Training. Keystone Architecture Debug

CSE325 Principles of Operating Systems. Processes. David P. Duggan February 1, 2011

Blackhawk USB560v2 System Trace Emulator. Installation Guide

Lecture 2: September 9

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

RM4 - Cortex-M7 implementation

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS

Jan 20, 2005 Lecture 2: Multiprogramming OS

Chapter 13: I/O Systems

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms

Process Description and Control. Major Requirements of an Operating System

( ZIH ) Center for Information Services and High Performance Computing. Event Tracing and Visualization for Cell Broadband Engine Systems

Major Requirements of an Operating System Process Description and Control

Transcription:

XDS560 Trace Advanced Use Cases for Profiling Daniel Rinkes Texas Instruments

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

What is AET / XDS560 Trace XDS560 Trace Allows Cycle Accurate Logging of CPU Address Bus and Data Bus Activity Non-Intrusively, in Real Time Advanced Event Triggering (AET) is logic that allows us to smartly turn trace on in interesting locations and off in noninteresting locations so as to preserve the trace buffer for interesting data

XDS560 Trace Architecture DATA ADDRESS BUSSES CPU DATA BUSSES PROGRAM BUS Trace & AET Jobs Comparators Compressor Cycle counter DSP XDS560T POD Additional JTAG Emulation Pins Current Buffer Size: 224K Future: 64 MB XDS560T POD RECORDING UNIT To/from host PC

Required Software and Hardware 60 Pin emulation header Target must support Trace (Full-Gem) Blackhawk USB 560 XDS560T Trace Pod/Cable CCS 3.30 or higher

64x+ Device Support For Trace SUPPORT TRACE DO NOT SUPPORT TRACE C6455 C6488 DM647/DM648 DM6443/DM6446 Any LC Device

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

Interrupt Profiling Overview Graphically display interrupt cycle accurate interrupt servicing frequency Capture Program Address and Timestamp whenever the PC is within the Interrupt Vector Table Generate a cycle accurate picture of when each interrupt starts executing

Trace Log PC Cycles 0x00897C60 102456 0x00897C64 102457 0x00897C68 102458 0x00897C6C 102459 0x00897C70 102460 0x00897C74 102461 0x00897C78 102462 0x00897C7C 102463 Cycle Count 102456 102463 102458 102459 102461 102462 102457 102460

Raw Trace Data Data Captured Processed Data

Results

Results - 2

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

Statistical Profiling GOAL Get a quick overall view of which functions in an application consume the most cycles Sampling every Program Counter in an application quickly consumes Trace Buffer Bandwidth, preventing analysis of the entire application We can eliminate this problem by only capturing a statistical sample of application execution

Fxn1 Fxn2 Fxn3 Fxn4 Fxn5 Fxn6 Statistical Profiling Overview The Program Address is sampled at regular intervals Statistical Analysis is performed on the captured samples As in any statistical analysis, the determinations made on the statistical sample can be related to the general population 4 4 2 5 3 3 2 2 2 3 3 6 6 4 4

Statistical Profiling - 2 AET contains all of the hardware needed to capture trace samples at a specified interval Interval should be carefully chose so as not to coincide with a periodic function Application instrumentation can switch AET off in locations that are not of interest

Statistical Profiling - Results Comma Separated Value Format Sorted from most intensive functions to least

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

Thread Aware Profiling GOAL Generate a cycle accurate execution graph of a Thread/Task based application Time (cycles)

Solution Instrument the task/thread switch function to write the task/thread ID to a well known location (global variable) Operating systems typically provide hooks to insert functions in this location Trace all of the writes to that location, and get a timestamp with each.

Results

Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph

65 Thread Aware Dynamic Call Graph 200 GOAL Display a Thread Based representation of actual function execution in an application 132 200 68

Capturing the Data Thread/Task Hook function writes address of task Function Entry/Exit points instrumented with Mark 0/ Mark 1 instructions* Mark 0 inlined at each function entry point Mark 1 inlined at each function exit point Trace captures each of these locations with timestamp * CGT 6.0.1 enables function hooks

Graphical Displays become impractical as the number of functions increase Presentation At right is a sample call graph displayed by the Guess graphing package

Modeled after Unix GPROF Each Thread separated into it s own subsection Each function section contains only immediate callers and callees GPROF Like Format

A Closer Look Lines Primary for a Functions Line - describes Callees Callers the function which the entry is about and gives the overall statistics for this function

Future Display Tree View Thread: 0x828194 _DEC_tcp2DeintUnpunctsoft3 _DEC_tcp2QuantizeSoft Callers Callees _DEC_tcp2PreProc _varianceestim Calls = 388/388 _COM_spoolTsk Exclusive Cycles = 14491050 _COM_spoolPost _DEC_tcp2EdmaIsr

Questions?

Backup Material

Comparison vs. Traditional The Results of the statistical profile tend to approximate those found using traditional profiling. Statistical vs Traditonal Profiling 40.00% % of Time 30.00% 20.00% 10.00% Traditional Statistical 0.00% Fxn1 Fxn2 Fxn3 Fxn4 Fxn5 Fxn6 Traditional 2.27% 25.00 29.55 27.27 4.55% 11.36 Statistical 0.00% 25.00 33.33 16.67 8.33% 16.67 Function

Advanced Event Triggering Overview

Use Case Examples Simple Read/Write Data Location Read/Write Data Range Specific Value Read/Write to/from Specific Location Medium Read/Write to location A, but only after executing instruction B Complex Read/Write to location A, but only after executing instruction B at least 20 times CPU Halt Interrupt Start/Advance Counter Stop/Reload Counter Start Trace Store Trace Sample End Trace External Trigger (EMU 0/1) State Transition

AET Target LIbrary What is it? A Target Library to allow programming of AET resources from the target application Advantages over CCS AET Plug-in? Context based control of where AET is enabled/disabled Ability to reallocate AET resources on the fly Some functionality not inherently contained in the plug-in Disadvantages Some Application cycles consumed Some Application footprint used

Use Case Consider: Thread based application crashing because of a suspected overflow of one of the task stacks. Would like to use AET to monitor the top of the task stacks and generate an interrupt when we get too close to the top Problems All of the threads are dynamically created at run time, so there s no way to know where the top of the dynamically allocated stack will be located at load time. There are numerous threads, and AET resources are consumed by ONE Watchpoint In Range job

Solution Use the AET Target Library All Operating Systems (DSP/Bios, OSE, etc) have a thread switch function that occurs when switching tasks. The user can place code in this hook that gets executed every time a thread switches context. In this function, program AET to watch the top of the stack of the NEXT thread. Every subsequent time that the thread is switched, delete the old AET job, and program a new one.

Solution - 2 What does this accomplish? Reuse of resources By removing the prior AET job, you are reclaiming the resources (trigger builder and comparators) that were used. Additionally, AET is only watching the CURRENT stack at a given time. Thus we re able to manage with the 4 data comparators that are available. Dynamic Task Location By programming AET at run time, we can know where the dynamically allocated tasks are placing their stack. This method also works perfectly well for statically allocated tasks.