Verification of Quantitative Properties of Embedded Systems: Execution Time Analysis. Sanjit A. Seshia UC Berkeley EECS 249 Fall 2009

Similar documents
Introduction to Embedded Systems

Introduction to Embedded Systems

Time Measurement. CS 201 Gerson Robboy Portland State University. Topics. Time scales Interval counting Cycle counters K-best measurement scheme

The course that gives CMU its Zip! Time Measurement Oct. 24, 2002

Computer Time Scales Time Measurement Oct. 24, Measurement Challenge. Time on a Computer System. The course that gives CMU its Zip!

Time Measurement Nov 4, 2009"

Quantitative Verification and Synthesis of Systems

Introduction to Computer Systems: Semester 1 Computer Architecture

Computer Organization - Overview

Introduction to Computer Systems

Introduction to Computer Systems

Performance Evaluation. December 2, 1999

Estrutura do tema Avaliação de Desempenho (IA32)

Voting Machines and Automotive Software: Explorations with SMT at Scale

Introduction to Computer Architecture. Meet your Colleagues. Course Theme CISC Michela Taufer September 4, 2008

Measurement tools and techniques

FMCAD 2011 (Austin, Texas) Jonathan Kotker, Dorsa Sadigh, Sanjit Seshia University of California, Berkeley

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Static WCET Analysis: Methods and Tools

Introduction to Embedded Systems

Analyzing Data-Dependent Timing and Timing Repeatability with GameTime

Course Overview. CSCI 224 / ECE 317: Computer Architecture. Instructors: Prof. Jason Fritts. Slides adapted from Bryant & O Hallaron s slides

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh

Efficient Microarchitecture Modeling and Path Analysis for Real-Time Software. Yau-Tsun Steven Li Sharad Malik Andrew Wolfe

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

CS 6354: Static Scheduling / Branch Prediction. 12 September 2016

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Modeling Hardware Timing 1 Caches and Pipelines

History-based Schemes and Implicit Path Enumeration

Martin Kruliš, v

Single-Path Programming on a Chip-Multiprocessor System

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Sciduction: Combining Induction, Deduction and Structure for Verification and Synthesis

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis

Timing Analysis of Interrupt-Driven Programs under Context Bounds

Performance analysis basics

Combining Induction, Deduction and Structure for Synthesis

Timing Analysis of Embedded Software for Families of Microarchitectures

Timing Analysis Enhancement for Synchronous Program

Predictable multithreading of embedded applications using PRET-C

Assembly Language: Overview!

Timing analysis and timing predictability

UNIT- 5. Chapter 12 Processor Structure and Function

Processing Unit CS206T

EECS 219C: Formal Methods Binary Decision Diagrams (BDDs) Sanjit A. Seshia EECS, UC Berkeley

Parts A and B both refer to the C-code and 6-instruction processor equivalent assembly shown below:

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Corrections made in this version not in first posting:

HW/SW Codesign. WCET Analysis

Chapter 4. The Processor

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Performance Tuning VTune Performance Analyzer

Status of the Bound-T WCET Tool

Chapter 14 - Processor Structure and Function

WCET-Aware C Compiler: WCC

Design and Analysis of Time-Critical Systems Introduction

Techniques described here for one can sometimes be used for the other.

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

Last time: forwarding/stalls. CS 6354: Branch Prediction (con t) / Multiple Issue. Why bimodal: loops. Last time: scheduling to avoid stalls

CPU Structure and Function

Introduction to Embedded Systems

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach

Lecture Notes for Chapter 2: Getting Started

CS241 Computer Organization Spring

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

LECTURE 3: THE PROCESSOR

Design and Analysis of Real-Time Systems Microarchitectural Analysis

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Lecture-13 (ROB and Multi-threading) CS422-Spring

Dissecting Execution Traces to Understand Long Timing Effects

Timing Anomalies Reloaded

6.1 Motivation. Fixed Priorities. 6.2 Context Switch. Real-time is about predictability, i.e. guarantees. Real-Time Systems

ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors

Superscalar Processors

Programmazione Avanzata

Accelerating Dynamic Binary Translation with GPUs

Compiler Optimization

COMPUTER ORGANIZATION AND DESIGN

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner

Full Datapath. Chapter 4 The Processor 2

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Microkernel/OS and Real-Time Scheduling

Multi-core Architectures. Dr. Yingwu Zhu

Lecture 12: Instruction Execution and Pipelining. William Gropp

Scope-based Method Cache Analysis

CPS104 Computer Organization Lecture 1

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

F28HS Hardware-Software Interface: Systems Programming

INSTRUCTION LEVEL PARALLELISM

Exploitation of instruction level parallelism

Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s

Transcription:

Verification of Quantitative Properties of Embedded Systems: Execution Time Analysis Sanjit A. Seshia UC Berkeley EECS 249 Fall 2009

Source Material in this lecture is drawn from the following sources: The Worst-Case Execution Time Problem Overview of Methods and Survey of Tools, R. Wilhelm et al., ACM Transactions on Embedded Computing Systems, 2007. Chapter 9 of Computer Systems: A Programmer's Perspective, R. E. Bryant and D. R. O Hallaron, Prentice-Hall, 2002. Performance Analysis of Real-Time Embedded Software, Y-T. Li and S. Malik, Kluwer Academic Pub., 1999. Game-Theoretic Timing Analysis, S. A. Seshia and A. Rakhlin, ICCAD 2008 Extended version is Technical Report EECS-2009-130 EECS 249, UC Berkeley: 2

Worst-Case Execution Time (WCET) of a Task The longest time taken by a software task to execute Function of input data and environment conditions BCET = Best-Case Execution Time (shortest time taken by the task to execute) EECS 249, UC Berkeley: 3

Worst-Case Execution Time (WCET) & BCET Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007. EECS 249, UC Berkeley: 4

The WCET Problem Given the code for a software task the platform (OS + hardware) that it will run on Determine the WCET of the task. Why is this problem important? The WCET is central in the design of RT Systems: Needed for Correctness (does the task finish in time?) and Performance (find optimal schedule for tasks) Can the WCET always be found? In general, no, because the problem is undecidable. EECS 249, UC Berkeley: 5

Typical WCET Problem Setting Task executes within an infinite loop while(1) { read_sensors(); compute(); write_to_actuators(); } This code typically has: loops with finite bounds no recursion Additional assumptions: runs uninterrupted single-threaded EECS 249, UC Berkeley: 6

Outline of the Lecture How to measure execution time Current Approaches to Execution Time Analysis Limitations The GameTime approach Demo of some tools EECS 249, UC Berkeley: 7

How to Measure Run-Time Several techniques, with varying accuracy: Instrument code to sample CPU cycle counter relatively easy to do, read processor documentation for assembly instruction Use cycle-accurate simulator for processor useful when hardware is not available/ready Use Logic Analyzer non-intrusive measurement, more accurate EECS 249, UC Berkeley: 8

Cycle Counters Most modern systems have built in registers that are incremented every clock cycle Special assembly code instruction to access On Intel 32-bit x86 machines: 64 bit counter RDTSC instruction sets %edx to high order 32-bits, %eax to low order 32-bits Wrap-around time for 2 GHz machine Low order 32-bits every 2.1 seconds High order 64 bits every 293 years [slide due to R. E. Bryant and D. R. O Hallaron] EECS 249, UC Berkeley: 9

Measuring with Cycle Counter Idea Get current value of cycle counter store as pair of unsigned scyc_hi and cyc_lo Compute something Get new value of cycle counter Perform double precision subtraction to get elapsed cycles /* Keep track of most recent reading of cycle counter */ static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; void start_counter() { /* Get current value of cycle counter */ access_counter(&cyc_hi, &cyc_lo); } [slide due to R. E. Bryant and D. R. O Hallaron] EECS 249, UC Berkeley: 10

Accessing the Cycle Counter GCC allows inline assembly code with mechanism for matching registers with program variables Code only works on x86 machine compiling with GCC void access_counter(unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } Emit assembly with rdtsc and two movl instructions [slide due to R. E. Bryant and D. R. O Hallaron] EECS 249, UC Berkeley: 11

Completing Measurement Get new value of cycle counter Perform double precision subtraction to get elapsed cycles Express as double to avoid overflow problems double get_counter() { unsigned ncyc_hi, ncyc_lo unsigned hi, lo, borrow; /* Get cycle counter */ access_counter(&ncyc_hi, &ncyc_lo); /* Do double precision subtraction */ lo = ncyc_lo - cyc_lo; borrow = lo > ncyc_lo; hi = ncyc_hi - cyc_hi - borrow; return (double) hi * (1 << 30) * 4 + lo; } [slide due to R. E. Bryant and D. R. O Hallaron] EECS 249, UC Berkeley: 12

Timing With Cycle Counter Time Function P First attempt: Simply count cycles for one execution of P double tcycles; start_counter(); P(); tcycles = get_counter(); What can go wrong here? [slide due to R. E. Bryant and D. R. O Hallaron] EECS 249, UC Berkeley: 13

Measurement Pitfalls Instrumentation incurs small overhead measure long enough code sequence to compensate Cache effects can skew measurements warm up the cache before making measurement Multi-tasking effects: counter keeps going even when the task of interest is inactive take multiple measurements and pick k best (cluster) Multicores/hyperthreading Need to ensure that task is locked to a single core Power management effects CPU speed might change, timer could get reset during hibernation EECS 249, UC Berkeley: 14

Outline of the Lecture How to measure execution time Current Approaches to Execution Time Analysis Limitations The GameTime approach Demo of some tools EECS 249, UC Berkeley: 15

Components of Execution Time Analysis Program path (Control flow) analysis Want to find longest path through the program Identify feasible paths through the program Find loop bounds Identify dependencies amongst different code fragments Processor behavior analysis For small code fragments (basic blocks), generate bounds on run-times on the platform Model details of architecture, including cache behavior, pipeline stalls, branch prediction, etc. Outputs of both analyses feed into each other EECS 249, UC Berkeley: 16

Program Path Analysis: Path Explosion for (Outer = 0; Outer < MAXSIZE; Outer++) { /* MAXSIZE = 100 */ for (Inner = 0; Inner < MAXSIZE; Inner++) { if (Array[Outer][Inner] >= 0) { Ptotal += Array[Outer][Inner]; Pcnt++; } else { Ntotal += Array[Outer][Inner]; Ncnt++; } Postotal = Ptotal; Poscnt = Pcnt; Negtotal = Ntotal; Negcnt = Ncnt; } Example cnt.c from WCET benchmarks, Mälardalen Univ. EECS 249, UC Berkeley: 17

Program Path Analysis: Overall Approach Construct Control-Flow Graph (CFG) for the task Nodes represent Basic Blocks of the task Edges represent flow of control (jumps, branches, calls, ) The problem is to identify the longest path in the CFG Note: CFG can have loops, so need to infer loop bounds and unroll them This gives us a directed acyclic graph (DAG). How do we find the longest path in this DAG? EECS 249, UC Berkeley: 18

Example d1 N = 10; q = 0; while(q < N) B1: N = 10; q = 0; x1 q++; d2 q = r; xi # times Bi is executed dj # times edge is executed x2 d5 B2: while(q<n) d3 d4 x4 B4: q = r; B3: q++; x3 d6 Example due to Y.T. Li and S. Malik EECS 249, UC Berkeley: 19

Program Path Analysis: Dependencies #define CLIMB_MAX 1.0... void altitude_pid_run(void) { float err = estimator_z - desired_altitude; desired_climb = pre_climb + altitude_pgain * err; if (desired_climb < -CLIMB_MAX) desired_climb = -CLIMB_MAX; if (desired_climb > CLIMB_MAX) desired_climb = CLIMB_MAX; } Only one of these statements is executed Example from PapaBench UAV autopilot code, IRIT, France EECS 249, UC Berkeley: 20

Example, Revisited xi # times Bi is executed dj # times edge is executed C i measured time taken by Bi d1 B1: N = 10; q = 0; x1 Want to d2 maximize i C i xi subject to constraints x1 = d1 = d2 x2 = d2+d4 = d3+d5 x2 d5 B2: while(q<n) d3 d4 x3 = d3 = d4 = 10 x4 = d5 = d6 x4 B4: q = r; B3: q++; x3 d6 Example due to Y.T. Li and S. Malik EECS 249, UC Berkeley: 21

Timing Analysis and Compositionality Consider a task T with two parts A and B composed in sequence: T = A; B Is WCET(T) = WCET(A) + WCET(B)? NO! WCETs cannot simply be composed Due to dependencies through environment EECS 249, UC Berkeley: 22

Timing Anomalies Branch evaluated I-Cache Hit A pre-fetch B (I$ miss due to pre-fetch) A B I-Cache Miss A (miss in I$) B Scenario 1: Instr A hits in I-cache, triggers branch speculation, and prefetch of instructions, then predicted branch is wrong, so Instr B must execute, but it s been evicted from I-cache, execution of B delayed. Scenario 2: Instr A misses in I-cache, no branch prediction, then B hits in I-cache, B completes. [ from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.] EECS 249, UC Berkeley: 23

Outline of the Lecture How to measure execution time Current Approaches to Execution Time Analysis Limitations The GameTime approach Demo of some tools EECS 249, UC Berkeley: 24

Current WCET Methods: Limitations Big Limitation: Environment (Platform) Modeling Where s s my platform? Tools only work for selected processors/compilers for which detailed models are hand-constructed Inaccurate & Tedious: platforms are becoming more complex, modeling takes months of human effort Brittle, not portable: small changes to the platform can require completely re-doing the analysis See e.g., [E. A. Lee, TR 07], [Kirner[ and Puschner, ISORC 2008] 25

Beyond WCET: Other Execution Time Problems Average-case analysis Given any program path (input), can we predict how long the program will take to execute, on average? Profile... Plot histogram of execution times of a program Find top 10% of longest program paths 26

Two Dimensions of the WCET Estimation Problem Challenge: Exponentially-many program paths (in worst case) Path Worst-case program path with Worst-case env. state State Challenge: How to find the worst-case state of the platform (environment)? Need accurate model of platform Need to find worst-case state 27

Classification of Current Tools Static Analysis Abstract interpretation generates invariants to capture worst-case environment states at control points loop bounds Find time bounds on basic blocks (straight-line program fragments) from worst-case state Use implicit path enumeration (IPET) based on integer programming to compute WCET A very effective approach if an accurate platform model is available Measurement-Based Run tests Test suite generated randomly or heuristically, e.g., using genetic algorithms, or via systematic methods such as model checking Measure execution time Compute maximum over all observed times Under certain conditions, this could be done compositionally, but in general need end-to to-end measurements 28

Some WCET Estimation Tools [ R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.] 29

Issues with Static Methods: Platform Modeling Program Problems: Time-consuming: several months to model a processor Inaccurate: may not model all of the platform Network Sensors/ Actuators Platform OS / other programs Processor PROCESSOR MODEL 30

Issues with Measurement-Based Tools How good is the test suite? Good path coverage? Does the worst-case platform behavior occur? Is the measurement accurate? 31

Outline of the Lecture How to measure execution time Current Approaches to Execution Time Analysis Limitations The GameTime approach (quick overview) Demo of some tools 32

The GameTime Approach: Contributions Model the estimation problem as a Game Tool vs. Platform Robust to changes in the platform Measurement-based Perform end-to to-end measurements of execution time of selected (linearly many) paths on target platform Learn Environment Model Learn a (graph) model of platform s s behavior Online algorithm: GameTime [Seshia & Rakhlin, ICCAD 08] Theoretical guarantee: can find WCET with arbitrarily high probability under some assumptions Leverages advances in Verification Engines Satisfiability modulo theories (SMT) solvers 33

Intuition for Our Approach Controlled by us (the estimation tool) Path Worst-case program path with Worst-case env. state Game: Tool vs. Platform State Controlled by the environment (platform) 34

Components of the Game Strategies (moves) of the Tool Strategies (moves) of the Platform (environment) Winning condition of the Game 35

Program Model: Unrolled CFG Model the program by its Control-Flow Graph (CFG) Unroll loops and recursive function calls, inline functions to get a Directed Acyclic Graph (DAG) Each source-sink path is a program execution that the tool can generate a test case for flag==1 flag=1; *x++; CFG for program containing loop with bound = 1 flag==1 flag=1; *x++; flag==1 CFG unrolled to a DAG 36

Tool Strategies = Paths Tool strategy: Paths in the control flow graph Tool selects: Inputs that drive the program down the chosen path 37

A Path is a Vector x {0,1} m (m = #edges) 1 1 1 1 1 1 Insight: Only need to sample a Basis of the space of paths 38

Platform Model Models path-independent timing Weights on edges of unrolled CFG & Path-specific perturbation Models path-dependent timing What we want to model: Impact of the platform on program execution time Lengths of all program paths 2 1 1 3 1 1 5 1 39

Platform s Strategies Weights on edges of unrolled CFG & Path-specific perturbation w R m π R m 40

The Game & Winning Condition Played over several rounds t = 1, 2, 3,, τ At each round t: Tool picks x t CFG 5 7 1 Platform picks w t 11 Platform picks π t (-1, -1, -1, -1) Tool observes l t = x t (w t + π t ) (5+7+1+11) - 4 = 20 At round At round τ : Tool predicts longest path : Tool predicts longest path x* τ Tool wins if its prediction is correct 41

Summary of Experimental Results GameTime is Efficient 7 x 10 16 total paths, vs. 183 basis paths Sampling basis paths tells us about longer paths we do not sample Found paths 25% longer than sampled basis GameTime can accurately estimate the timing profile with few measurements GameTime does better than Random Testing Found estimates twice as large GameTime can even find larger WCET estimates than conservative WCET estimation tools 42

Open Problems Architectures are getting much more complex. Can we create processor behavior models without the agonizing pain? Can we change the architecture to make timing analysis easier? [See PRET machine project by Prof. Lee and colleagues] Analysis methods are Brittle small changes to code and/or architecture can require completely re-doing the WCET computation Use robust techniques like GameTime that learn about processor/platform behavior Need to deal with concurrency, e.g., interrupts Need more reliable ways to measure execution time EECS 249, UC Berkeley: 43