Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh

Similar documents
Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Static WCET Analysis: Methods and Tools

Dissecting Execution Traces to Understand Long Timing Effects

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

HW/SW Codesign. WCET Analysis

WCET-Aware C Compiler: WCC

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

actual WCET safe BCET estimates safe WCET estimates time possible execution times

Last Time. Response time analysis Blocking terms Priority inversion. Other extensions. And solutions

Execution Time Analysis for Embedded Real-Time Systems

Timing Analysis of Parallel Software Using Abstract Execution

Trickle: Automated Infeasible Path

Efficient Microarchitecture Modeling and Path Analysis for Real-Time Software. Yau-Tsun Steven Li Sharad Malik Andrew Wolfe

Simplified design flow for embedded systems

Timing Anomalies and WCET Analysis. Ashrit Triambak

Approximation of the Worst-Case Execution Time Using Structural Analysis. Matteo Corti and Thomas Gross Zürich

History-based Schemes and Implicit Path Enumeration

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

6.1 Motivation. Fixed Priorities. 6.2 Context Switch. Real-time is about predictability, i.e. guarantees. Real-Time Systems

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Fall 2015 CSE Qualifying Exam Core Subjects

Verification of Quantitative Properties of Embedded Systems: Execution Time Analysis. Sanjit A. Seshia UC Berkeley EECS 249 Fall 2009

Introduction to Embedded Systems

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

B553 Lecture 12: Global Optimization

Search: Advanced Topics and Conclusion

On the False Path Problem in Hard Real-Time Programs

Status of the Bound-T WCET Tool

Coding guidelines for WCET analysis using measurement-based and static analysis techniques

CLOCK DRIVEN SCHEDULING

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

FMCAD 2011 (Austin, Texas) Jonathan Kotker, Dorsa Sadigh, Sanjit Seshia University of California, Berkeley

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Algorithm must complete after a finite number of instructions have been executed. Each step must be clearly defined, having only one interpretation.

Joint Entity Resolution

Fall 2011 Prof. Hyesoon Kim

Writing Parallel Programs; Cost Model.

Cpt S 223 Course Overview. Cpt S 223, Fall 2007 Copyright: Washington State University

Transforming Execution-Time Boundable Code into Temporally Predictable Code

Improving Timing Analysis for Matlab Simulink/Stateflow

The Worst-Case Execution Time Problem Overview of Methods and Survey of Tools

Timing Analysis Enhancement for Synchronous Program

Program Analysis and Constraint Programming

The Automatic Design of Batch Processing Systems

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

INTRODUCTION TO ALGORITHMS

CSE 417 Branch & Bound (pt 4) Branch & Bound

r-tubound: Loop Bounds for WCET Analysis (tool paper)

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Choice of C++ as Language

Real-Time and Embedded Systems (M) Lecture 19

CSc 520 Principles of Programming Languages. 26 : Control Structures Introduction

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

(Refer Slide Time: 1:27)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Recursion. What is Recursion? Simple Example. Repeatedly Reduce the Problem Into Smaller Problems to Solve the Big Problem

LECTURE 18. Control Flow

Automatic flow analysis using symbolic execution and path enumeration

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Optimizations - Compilation for Embedded Processors -

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Performance analysis basics

2. Introduction to Software for Embedded Systems

Lecture 12: Instruction Execution and Pipelining. William Gropp

EE382V: System-on-a-Chip (SoC) Design

Spring 2011 CSE Qualifying Exam Core Subjects. February 26, 2011

Programming Embedded Systems

Data Structures Lecture 3 Order Notation and Recursion

Optimising with the IBM compilers

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

Critical Analysis of Computer Science Methodology: Theory

Predictable multithreading of embedded applications using PRET-C

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Algorithms for Integer Programming

Implementation of Lexical Analysis. Lecture 4

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

The Worst-Case Execution-Time Problem Overview of Methods and Survey of Tools

Pipelining, Branch Prediction, Trends

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

Writing Temporally Predictable Code

Lecture 21. Software Pipelining & Prefetching. I. Software Pipelining II. Software Prefetching (of Arrays) III. Prefetching via Software Pipelining

Analytic Performance Models for Bounded Queueing Systems

Introduction to Computer Systems and Operating Systems

Artificial Intelligence

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Source EE 4770 Lecture Transparency. Formatted 16:43, 30 April 1998 from lsli

Register Allocation. CS 502 Lecture 14 11/25/08

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Search: Advanced Topics and Conclusion

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Improvement: Correlating Predictors

3 INTEGER LINEAR PROGRAMMING

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 4 Notes. Class URL:

Transcription:

Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh

Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow Analysis Low-Level Analysis Calculation

Motivation: Characteristics of Real-Time Systems Concurrent control of separate system components Reactive behaviour Guaranteed response times Interaction with special purpose hardware Maintenance usually difficult Harsh environment Constrained resources Often cross-development Large and complex Often have to be extremely dependable

What is the Execution Time of a program? WCET: Worst-Case Execution Time BCET: Best-Case Execution Time ACET: Average-Case Execution Time [Wilhelm+08] The WCET/BCET is the longest/shortest execution time possible for a program. Must consider all possible inputs including perhaps inputs that violate specification.

Why may we care about the WCET? We are interested in WCET to... perform schedulability anaylsis ensure meeting deadlines assess resource needs for real-time systems WCET accuracy may be safety-critical!

And why may we care about the BCET? We are interested in BCET to... benchmark hardware assess code quality assess resource needs for non/soft real-time systems ensure meeting livelines (new starting points)

What is the Execution Time of a program? safe BCET estimates actual BCET possible execution times actual WCET safe WCET estimates 0 tighter tighter time [EngblomES01] Approaches for approximating WCET or BCET Measuring: Measure run time of program on target hardware Analysis: Compute estimate of run time, based on program analysis and model of target hardware Hybrid: Combine measurements with program analysis

Measuring WCET/BCET Execution time may depend on program inputs In this case, quality of measurements depends on judicious choice of inputs Execution time may depend on execution context (cache content, state of pipeline,...) Typically need to add safety margin to best/worst result measured Extensive testing/measurement still common practice

Measuring Program Run Times Call OS timing routines Account for cost of calls to timing routines themselves Access hardware timers directly Use external hardware Oscilloscope, Logic analyser Count emulator cycles High water marking Continuously record max execution times Standard feature of RTOSs May include this in shipped products Read at service intervals

Analysing WCET/BCET Instead of measuring execution times, compute them Advantages Can ensure safety of result Saves testing effort Disadvantages Try to be as tight as possible may not always succeed Typically requires extensive analysis effort Accuracy depends on Complexity of hardware Program structure Quality of hardware model Program analysis capabilities

Analysing WCET/BCET Program Source Flow Analysis Compiler Global Low- Level Analysis Calculation WCET Object Code Local Low- Level Analysis [EngblomES01]

Flow Analysis Analyse dynamic behaviour of program Number of loop iterations, Recursion depth, Input dependences, Infeasible paths, Function instances,... Get information from Static Analysis Manual Annotation Analysis level Object code Source code (may need non-trivial mapping to object code)

Flow Analysis Structurally possible flows (infinite) Basic finiteness Statically allowed Actual feasible paths WCET found here=desired result WCET found here=overestimation [EngblomES01]

Flow Analysis The set of structurally possible flows for a program, i.e. those given by the structure of the program, is usually infinite, since e.g. loops can be taken an arbitrary number of times The executions are made finite by bounding all loops with some upper limit on the number of executions (basic finiteness) Adding even more information, e. g. about the input data, allows the set of executions to be narrowed down further, to a set of statically allowed paths. This is the optimal outcome of the flow analysis.

Flow Analysis const int max = 100; foo (float x) { A: for(i = 1; i <= max; i++) { B: if (x > 5) C: x = x * 2; else D: x = x + 2; E: if (x < 0) F: b[i] = a[i]; G: bar (i) }} Loop bounds: Easy to find in this example; in general, very difficult to determine Infeasible paths: Can we exclude a path, based on data analysis? A-B-C-E-F-G is infeasible since if x>5, it is not possible that x * 2 < 0. Well, really? What about integer overflows? Must be sure that these do not happen in the example...

Low-Level Analysis Program Source Flow Analysis Compiler Global Low- Level Analysis Calculation WCET Object Code Local Low- Level Analysis [EngblomES01] Determine execution time for program parts Account for hardware effects (pipeline, caches...) Work on object code Exact analysis generally not possible

Low-Level Analysis Global Low-Level Analysis Considers execution time effects of machine features that reach across entire program Instruction/data caches, branch predictors, translation lookaside buffers (TLBs) Local Low-Level Analysis Considers machine features that affect single instruction & its neighbours Scalar/superscalar pipelines

Worst-Case Execution Time Analysis Flow Analysis Low-Level Analysis Calculation Local low-level analysis Pipelining Local Low-Level Analysis - Pipelining!!"# % 1235"# 1245"$!$#%()7898!)%).)-+!/0, 12345"6!"#%()*+),-)%).)-+!/0,!!""%&'!""$ &!'#%:0;)</,=%!>)%!/?/,=%+@/,=!/?)@%8,;%!/?/,=%)AA)-!@ [EngblomJ02] Fig. 1. Pipelining of instruction execution effect of two successive Pipeline Pipelining effect of two successive instructionsinstructions Pipeline overlap reduces overall computation time!!"b by%δ = 2 Pipeline overlap time by δ!="%&' 2 1235"B reduces overall 1245"#computation 12D5"B!"!""# & Embedded Real-Time Systems!"#$%&'$()"*+ #$,--#.*/,($("0 1)%*2*$)*-3,43 $53*6*#$,03 12345"C SS 2010, Lecture 15 124D5"C!"#"%&'!!"# Types of Execution Times Slide 18!#"B (!!"#"%G6

$53*6*#$,03 Worst-Case Execution Time Analysis!$#%H.)-+!/0,%!/?)@ 1235"# 1234D5"EF 1245"$ 12345"6!!"# Types of Execution Times! "%&'!""$ Measuring vs.!" % Analyzing Flow Analysis & Low-Level Analysis!"#%1/?/,=%?0;)< Calculation Local Low-Level Analysis - Pipelining Local low-level analysis Pipelining Fig. 2.!$#%()7898!)%).)-+!/0,!'#%:0;)</,=%!>)%!/?/,=%+@/,= Example long timing effect!"#%()*+),-)%).)-+!/0,!/?)@%8,;%!/?/,=%)aa)-!@ Fig. 1. Pipelining of instruction execution I1... Im is different from the execution of the nodes I2... Im starting with node I2 (see Section 4.1 for more details). %&'()*+,--)./ %&'()*+'01)./ ("!""#$%"&'(!!"B %! ( $A the execution profile!;3 of BC is; differ- "&,-.)/*00 Figure 2 shows an example of a LTE, where $5BC000D$(!;= 1235"B 1245"# 12D5"B "%&' sequence!!"the ent when executed in isolation and when executed as part of ABC.!= Note $$(" $ $$$A diagrams, similar to the reservation that we illustrate pipeline execution using pipeline!=6!=< $$$5BC00D$$!""# & tables commonly used to describe the behavior$$$$000$ of pipelined processors. Time runs hor#!6!!< #!"#"%&' $$$-!E!3 cycle. The pipeline!""#$%"&'( izontally, with each 12345"C tick of time corresponding to a processor clock!!"# "%G6! 124D5"C!6> $$$$000$! <>!"#$%&'$()"*+on the vertical axis. Instructions progress from 5''-.)/2)!!#"B ( stages are shown upper left to lower * $$$5BC000D$ " "!?= #$,--#.*/,($("0 5''-.)/800 $$$$000$ & 1)%*2*$)*-3,43!>7!>: right, and each cycle of execution is shown as a square. $53*6*#$,03 5''-.)/*00 $$$-!E! & 7 %!: LTEs can in general be both negative and$$$$000$ positive, %and must be accounted for by $$$F!"#%1/?/,=%?0;)<!!:? 1234D5"EF!$#%H.)-+!/0,%!/?)@ a static WCET analysis method that wants to $$GH5!-C00D be safe and tight. Positive 7?timing effects ' ' [EngblomJ02] an underfig. and 2. Example long$-!etiming add execution time to a program, are critical to effect consider since otherwise!?@ $$$000$*! Pipelining of three instructions 3@ estimate effect of the WCET couldsuccessive result. Negative timing effects indicate potential savings!@;in $F ) ) Pipelining effect ofthem three instructions execution ignoring onlysuccessive makes the WCET I1 time,... Im and is different from the execution of thegh5!-c000d$ nodes I2... Iestimate with tight. node I2 m starting less (see Section 4.1 for more details). "#$%&'()'#*%+(,".$%/+(0-%)'#01%2341%#44#+1-,%56(2%5# Positive LTEs comprise a central three problem for WCET analysis, since theythan make it Reduction of combining instructions can be larger Reduction offigure combining can be 2: larger sum Attached of savingsflow F 2 shows an three example instructions of a LTE, where the execution profile ofthan BC iswith differfigure Scopes necessary to consider effects across more than just adjacent instructions assumption in sum of savings when combining them pair-wise! ent when executed in isolation and when executed as part of the sequence ABC. Note when combining them pair-wise! order to that create a safe analysis. To prove a pipeline particular WCET analysis method safe, it is we illustrate pipeline execution using diagrams, similar to the reservation The target chip fortime theruns present i Embedded Real-Time SS 2010, Lecture 15 have Slide necessary to address theused question ofsystems whether all LTEs been accounted for. tables commonly to describe the behavior ofpositive pipelined processors. hor- 19 implementation V850E, a typical 32-bit embedded tick of corresponding tosufficient a processor clock cycle. The pipeline A centralizontally, issue inwith thiseach paper is time therefore to give conditions forrisc when LTEs are microcon stages are shown on the vertical axis. Instructions progress[6]. fromthe uppercompiler left to lowerused is an IAR V8 chitecture guaranteed not to be positive. right, and each cycle of execution is shown as a square. C/Embedded C++ compiler [32].

Global Low-Level Analysis - Caches Instruction Caches Predictable from control flow Data Caches No simple way to predict accesses Very difficult analysis problem Unified Caches Very pessimistic as a result of combining instructions & data

Worst-Case Execution Time Analysis Measuring vs. Analyzing Flow Analysis Low-Level Analysis Calculation Global low-level analysis Caches Global Low-Level Analysis - Caches D#73+%.6(+E / '()$*+ 0012)!3!" )2)& +!"#"$%N(($.*O.4!"##$%!($$PQRO.S, )2)& )2)&. '()$4+ )!6 #!7! " 00*+2)!6!$ 0092)!6!!: $ $ $ $ $ $ )2)& - )2)& 4K899 "KJ8=> A!-+B43(C 3C5('*#43(C!"#&$%TU#$.*O.S!"#'$%%V-$PER!"#"$%()*)+,%-(..!"##$%()*)+,%+(/0 %%%%%%%%%%1-,-%2345!"#&$%()*)+,%+(/!"#'$%()*)+,%+(/!"#"$%&%'()*&+",*&-.-/0,"1$&"$21'#(,"1$ / + 4J89; "JL8=@ "KI8=?,. "LM8=9-4I89: "IL8=9 4L8; 4M89< &32,-'&)")-4"$-&($(456"6 Figure 3: Timing Graph with Execution [StappertEE01] Facts May split loops to loops differentiate between first and successive loop iterations May split to differentiate between first and successive loop iterations / / 9,60:41'. "" )2)& )2)& %5#+47 Must combine with pipelining effects effects Must combine with pipelining + Facts n is the NEC ontroller arv850/v850e )2)& + 9,60:41'. / 4!"" K "KJ!*?@ "; << 15 / + Systems 9,60:41'. )2)& Embedded Real-Time SS 2010, Lecture 3.456)-1*'7 1,6,-5*5.4(8 )2)& )2)& 9,60:41,'-*.0-% + 4!"; J =,6,-5*5.4(8 >,18*1,6)% Figure 4: Timing Effect Calculation Slide 21

WCET Calculation Program Source Flow Analysis Compiler Global Low- Level Analysis Calculation WCET Object Code Local Low- Level Analysis [EngblomES01] Task: Find the path that results in the longest execution time Several approaches in use Properties of approaches Program flow allowed Object code structure (optimisations?) Pipeline effect modelling Solution complexity

WCET Calculation Path-based Constraint-based Implicit Path Enumeration Technique - IPET Structure-based

WCET Calculation [Wilhelm+08]

Path-Based Bound Calculation Upper bound for a task is determined by computing bounds for different paths in the task, searching for the overall path with the longest execution time. Defining feature is that possible execution paths are represented explicitly. Natural within a single loop iteration, but problems with flow information extending across loop nesting levels. Number of paths is exponential in the number of branch points. Possibly requiring heuristic search methods.

Implicit Path Enumeration Program flow and basic block execution time bounds are combined into sets of arithmetic constraints. Each basic block and program flow edge in the task is given a time coefficient, expressing the upper bound of the contribution of that entity to the total execution time every time it is executed.

Structure-based Bound Calculation Upper bound is calculated in a bottom-up traversal of the syntax tree of the task combining bounds computed for constituents of statements according to combination rules for that type of statement. Not every control flow can be expressed through the syntax tree Assumes straight-forward correspondence between source structures and the target program Not easily admitting code optimisations In general, not possible to add additional flow information (as in IPET).

Summary Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow Analysis Low-Level Analysis WCET Calculation

Preview Real-Time Operating Systems MQX