Timing Analysis for Sensor Network Nodes of the Atmega Processor Family

Size: px
Start display at page:

Download "Timing Analysis for Sensor Network Nodes of the Atmega Processor Family"

Transcription

1 Timing Analysis for Sensor Network Nodes of the Atmega Processor Family Sibin Mohan, Frank Mueller, David Whalley and Christopher Healy Dept. of Computer Science, Center for Embedded Systems Research, North Carolina State University, Raleigh, NC , Dept. of Computer Science, Florida State University, Tallahassee, FL 06, Dept. of Computer Science, Furman University, Greenville, SC 96, Abstract Low-end embedded architectures, such as sensor nodes, have become popular in diverse fields, many of which impose real-time constraints. Currently, the Atmel Atmega processor family used by Berkeley Motes lacks support for deriving safe bounds on the WCET, which is a prerequisite for performing real-time schedulability analysis. Our work fills this gap by providing an analytical method to obtain WCET bounds for this processor architecture. Our first contribution is to analyze both C and NesC code, the latter of which is unprecedented. The second contribution is to model control hazards and variable-cycle instructions, both handled more efficiently by our approach than by previous ones and results in up to 77% improvement in bounding the WCET. The results demonstrate that our timing analysis framework is able to tightly and safely estimate the WCET of the benchmarks while simulator results are shown to not always provide safe WCET bounds. While motivated by the Atmel Atmega series of processors, results are equally applicable to low-end embedded processors. This work is, to the best of our knowledge, the first set of experiments where timing results are contrasted from execution on an actual processor, from a cycle-accurate simulator and from a static timing analyzer. Furthermore, making our timing analysis toolset available to the Atmel Atmega processor family is a significant contribution towards addressing a documented need for tool support for sensor node architectures commonly used in networked systems of embedded computers, or so-called EmNets.. Introduction Networked systems of embedded computers, referred to as EmNets [0], usually consist of one or several computationally powerful nodes called base stations and a large number of inexpensive, low capacity nodes called sensors (or sensor nodes). The nodes in a wireless sensor network This work was supported in part by NSF grants CCR-00858, CCR-00860, CCR-0695, EIA-00704, CCR-00889, CCR- 049 and CCR-05. communicate through wireless links, typically with rather limited bandwidth. Such EmNets have extensive applications ranging from civilian to military operations, many of which are subject to timing constraints: On the one hand, applications themselves generally pose requirements on the frequency of sensing; on the other hand, constraints on energy consumption and effective communication are also subject to timing constraints and clock synchronization assumptions. In recent years, low-end sensor nodes, especially the ones from Berkeley [6], have become popular and they find applications in diverse fields. These sensor nodes use the Atmel Atmega series of processors []. Such sensor nodes and networks composed of these nodes find applications in real-time environments. Research on timing constraints for sensor node architectures is still preliminary; except for ad-hoc solutions specific to applications, there is a lack of software support for handling timing constraints. This paper addresses this issue by providing automated tool support for determining bounds on the worst-case execution time (WCET) of programs, tasks and even smaller program scopes for applications on Atmel architectures. These WCET bounds are particularly useful in real-time applications for EmNets. In real-time systems, particularly hard real-time systems, violations of temporal constraints may have irreparable effects on the system, its environment, or both. Real-time systems theory reasons about the schedulability of task sets, i.e., by performing offline schedulability tests, which can determine if all deadlines of a set of tasks can be met. Task parameters have to be known beforehand, i.e., parameters such as period of each task, the worst-case execution time (WCET), and so on. Periods of tasks are determined from the operating environment, such as temporal constraints on sensors, actuators and other parts of the system. Determining the WCET for tasks is a non-trivial effort due to software complexity, non-determinism of inputs and hardware complexity with unpredictable execution behavior. Various approaches to determine the WCET of tasks exist, such as experimental methods, which are considered unsafe or constrained to probabilistic analysis [0, 4]. Static analysis methods have also been developed to derive safe

2 WCET bounds, and they model hardware components, e.g., the processor pipeline, caches, etc. They model the flow of code through various hardware components and use interprocedural program representation and longest control-flow paths to obtain an upper bound on the number of cycles for any execution. In this paper, we present a timing analysis framework that analyzes programs, such as EmNets applications, for the Atmel architecture to obtain worst-case execution times. We present a unique approach in that a three-tiered system of experiments is carried out to verify the correctness of the WCET numbers obtained using our timing analyzer. First, we obtain execution times from the actual Atmel hardware itself. Second, we obtain execution times from a cycle-accurate simulator from the processor manufacturer. Finally, we run our benchmarks through our timing analysis framework to obtain WCET estimates. Our timing analysis framework is able to tightly and safely estimate the WCET for the various benchmarks. We also show that the methods used are scalable. When the input set sizes are scaled, the WCET estimates closely follow the actual execution cycles of the processor. We have also adapted our framework to provide WCET estimates for NesC [], the preferred language for programming sensor network applications. We are able to obtain accurate WCET estimates for both NesC code as well as C code. We have also created an abstract NesC interface, which makes timing analysis of any piece of synchronous NesC code very straightforward. To the best of our knowledge, the interoperability of our timing analysis framework with NesC is unique. The timing analysis framework we use was initially developed for the Micro SPARC I [9] architecture. It provides accurate and tight WCET estimates comparable to other WCET methods and tools available. Since the Atmel architecture is simpler than the SPARC in that it does not have caches and that it has only two pipeline stages, we believed that a simple one-to-one port was all that was necessary to adapt it for the Atmel architecture. Contrary to our intuition, we found that important enhancements were required even for a simple architecture as the Atmel Atmega to obtain tight WCET bounds as is demonstrated in the following. Each singular contribution resulted in 5% to 40% improvement in accuracy of bounding the WCET. One significant problem faced by WCET tools is to cope with instructions that take a variable number of cycles to execute. Certain instructions may take a different number of cycles, depending on the the state of the system, such as status of a condition code register, or whether a branch is taken or not, and so on. Traditionally, such instructions cause WCET tools to over-estimate the WCET numbers, as they always assume the largest possible number of cycles for such instructions. This typically results in large overestimations for any applications when variable-cycle instructions are found in loops. We solve this problem by enhancing the path analysis to accurately account for the overhead of variable-cycle instructions depending on the actual execution context. Hence, we are able to obtain 0-40% tighter bounds on the WCET for Atmel processors. While motivated by the Atmel architecture, these improvements transfer equally to other low-end embedded architectures that commonly have control hazards and variable-cycle instructions. One of the lessons learned is that the design of static timing analysis tools should be accompanied by verifications of correctness for a set of benchmarks before applying these techniques to larger applications. Verification can be realized by comparing to a cycle-accurate simulator or actual embedded hardware, as demonstrated by our experiments.. Static Timing Analysis Real-time schedulability analysis for any hard real-time system requires the WCET to be known beforehand and safely bounded. This is so that the feasibility of scheduling a task set can be determined given a scheduling policy, such as rate-monotone or earliest-deadline-first scheduling [9]. If dynamic timing analysis methods were used based on experimental or trace-driven approaches, the safety of the WCET values obtained cannot be guaranteed [0]. Also, it is difficult to determine worst-case input sets for even moderately complex tasks such that the WCET may be obtained, and performing exhaustive testing over wide ranges of the input space is infeasible, except for trivial cases. Even if the worst-case input set is known, interactions between hardware and software might inhibit programs from exhibiting their worst-case behavior. Architectural complexity is the main reason for such unpredictability, in particular complex pipelines and caching mechanisms. The alternative to dynamic timing analysis, of course, is Static Timing analysis. While various approaches have been studied and proposed, we constrain ourselves to the toolset we use, without loss of generality [,, ]. WCET bounds obtained by static timing analysis guarantees the upper bounds on computation times of tasks. To achieve this, static timing analysis models the traversal of all possible execution paths in the code and determines timing independent of program traces or values of program variables. Loop bodies require only few traversals,and the worst-case behavior of the loop is obtained by an efficient fixed-point approach. The behavior of architectural components along execution paths are captured as the execution paths are traversed. Paths are composed to form functions, loops, etc. Finally, the entire application is covered to derive a bound on the worst-case execution cycles (WCEC) and the WCET. An overview of the organization of this timing analysis framework is illustrated in Figure. Modifications to an optimizing compiler result in the production of control-flow

3 C Source Files Compiler Control Flow and Constraint Information Timing Analyzer Machine Dependent Information Figure. Static Timing Analysis Framework WCET Estimate and branch constraint information as a side effect of the the compilation process. The GCC compiler for the AVR instruction set architecture (ISA) is used to compile real-time applications into assembly code. Control-flow graphs and instruction and data references are extracted from this assembly code. A prerequisite for performing static timing analysis is that upper bounds on the number of iterations performed by various loops in the program be provided, either through program analysis or provided by the user. The timing analyzer uses the control flow, the constraint information, and architecture-specific information (e.g., pipeline characteristics) to calculate bounds on the WCET. While the analysis framework also supports static analysis of caches and modeling of static branch prediction [, ], these components are omitted here since the Atmel AVR processor family does not support them. The control-flow information, the loop-bounds, optional caching categorizations, and the pipeline descriptions are used by the timing analyzer, to derive WCET bounds. The effects of structural hazards (instructions dependencies due to constraints on functional units resulting in stalls), data hazards (load-dependant instruction stalls if a use immediately follows a load instruction), branch prediction and cache misses (derived from caching categorizations) are considered by the pipeline simulator for each execution path through a loop or a function. Static branch prediction can be easily accommodated by WCET analysis - the misprediction penalty is added to the non-predicted path. Path analysis, explained below, selects the longest execution path. Once timing results for alternate paths in a loop are available, a fixed-point algorithm that quickly converges in practice is employed to safely bound the time for all iterations of the loop based on the loop body s cycle counts. Path analysis for only a few iterations provides the necessary data for the fixed-point algorithm. If the longest path for the first iteration has been determined, the next-longest path for the next iteration can be determined, and this difference may occur only due to caching effects, if any. Lengths of these paths monotonically decrease due to the abovementioned cache effects, and once a fixed-point is reached, subsequent loop iterations can be safely approximated by this fixed-point timing value, as shown in []. While combining the longest paths of consecutive iterations, care must be taken to account for overlap between the tail of the preceding path and the head of the succeeding path. Not considering this overlap is tantamount to draining the pipeline between iterations. The timing analyzer ultimately derives WCET bounds using this fixed-point approach, first for each path, then for each loop, and finally for each function. A timing analysis tree is constructed, such that each loop or functions corresponds to a node of the tree. The tree is processed in a bottom-up manner, i.e., the WCET for an outer loop nest/caller function in the tree is not calculated until the times for all of it s inner loop nests/callees are known. Hence, the timing analyzer predicts the WCET for tasks by first analyzing lower-level loops and functions before proceeding to higher-level loops and functions, eventually reaching the root of the tree (e.g., main()). The timing analysis tree provides a convenient mechanism for obtaining the WCET for specific scopes, in particular for sub-tasks. The material in this section shows that static timing analysis is a non-trivial task, even for a simple architecture such as that of the Atmel processors.. Adapting to Architectural Features While the timing analysis framework was being adapted to the Atmel Atmega architecture, several enhancements to the existing framework were deemed necessary. While the original framework provided safe WCET bounds, they were not necessarily tight bounds. A number of important enhancements resulted in significant improvements in terms of obtaining tight WCET bounds: Variable-cycle instructions: Certain instructions take a variable number of cycles depending on the context of their execution. We adopted our framework, specifically the pathmerging logic, to account for these instructions as explained in Subsection.. Pipeline overlap handling for adjacent loop iterations: The pipeline structure of adjacent loop iterations must be matched so to reflect the processor pipelines without unnecessary stalls. During path analysis, pipeline information is calculated for traces of straight-line code, i.e., sequences of basic blocks, so-called paths. When considering paths of consecutive loop iterations, timings of each pipeline stage for the first and last instruction in a path need to be considered. The objective is to compose the trailing pipeline step-curve of a path with the leading step-curve of the next path. This composition was being handled incorrectly for the Atmel Atmega architecture. We performed modifications to eliminate unnecessary stalls that were affecting the tightness, albeit not the correctness of the WCET bounds, as explained in Subsection.... Variable-Cycle Instructions Consider a simple conditional branch instruction where a branch is either taken or not taken. Let us assume that if the branch is taken, it completes in two cycles; if it is not taken, the branch completes in one cycle. The reason for the difference is given by the overhead imposed on resolving the target of the branch. Traditional static timing analysis meth-

4 ods would assume worst-case behavior and estimate that the branch would execute in two cycles, irrespective of the behavior of the instruction. This would lead to the WCET not being tight enough and could lead to gross over-estimation, especially if branches are embedded within loops, which is inevitable for the Atmel Atmega architecture. Such behavior is typically observed for instructions that modify the control flow of the program. We have enhanced the timing analysis framework to take this into account and estimate the number of execution cycles required by the branch instruction correctly. As explained before, once the timing of alternate paths is complete, the fixed-point algorithm takes over and works towards convergence. At this stage, alternate paths created by instructions that modify the control flow of the program are analyzed. By analyzing alternate paths created as the result of a single instruction s modification of the control flow, we can accurately determine the number of cycles taken by that instruction. For example, the above branch instruction would take one cycle to execute in the path that the execution falls through (branch not taken) and two cycles in the path that does not fall through (branch taken). Hence, an adjustment can be made to one of the alternate paths so that it reflects the true nature of execution. Suppose it had been decided that we always assign two cycles to the branch instruction. We would then reduce the WCEC of the path in which the branch is not taken by one so that it is equivalent to the branch instruction taking one cycle to execute. This is shown in Figure. The fixed-point algorithm would now analyze the two paths and accurately obtain WCET information. This enhancement to the static timing analysis framework results in tighter, more accurate WCET estimates for the tasks being examined. Red cycles by on this path B A br D C branch instruction with variable execution cycles Not Taken Path ( A, B, D ) Taken Path ( A, C, D ) Figure. Modeling Instructions with Variable Execution Times.. Pipeline Modeling across Loops Iterations When timing analysis is performed for loop bodies, care must be taken to ensure that the pipeline structure of adjacent loop iterations is being correctly composed, as shown in Figure (b). Suppose the two pipeline structures were matched incorrectly, as shown in Figure (c). The WCET estimates obtained would be inaccurate and not representative of the execution on the real architecture. In most cases, the errors would accumulate over all loop iterations, and the WCET estimate obtained would be a gross overestimation. This is the case for the example shown in Figure (c)(i). Even more seriously, an unsafe underestimation may occur if only the leading edge (IF stage) of the pipeline is considered for composition, as shown in Figure (c) (ii). While we did not observe the latter case, the former case was encountered as a result of the initial porting efforts of our analysis framework. Our timing analysis framework was composing by overlapping stages, but only to a limited extent. We observed that an extra cycle was being introduced at the end of every iteration for certain loops. The composition was enhanced by improving the logic that performs the analysis for adjacent loop iterations to remove the extra cycle. Instructions IF instr instr instr EX (a) Single Iteration of Loop instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr instr (b) Correct Handling of Loop Iterations IF and EX NOT Overlapped Extra Cycle Introduced (i) Over estimation of WCET (c) Incorrect Handling of Loop Iterations OR instr instr instr instr instr instr IF of instr overlaps with EX of instr of previous iter Overlap NOT Correct (ii) Under estimation of WCE Figure. Composing Pipeline Behavior of Adjacent Loop Iterations 4. Features of the Atmega Architecture We shall limit our discussion to the Atmega8 / Atmega0 processors, two low-power CMOS 8-bit micro controllers based on the AVR enhanced RISC processors. Architectural Model: The AVR uses a Harvard architecture with separate memories and buses for program and data. Instructions located in the program memory are executed with a single level of pipelining. As one instruction is being executed, the next instruction is fetched, i.e., instructions execute every clock cycle. Thus, the pipeline has just two stages Instruction Fetch (IF) and Execute (EX). The AVR processors have two main memory spaces, the Data memory and the Program memory space. It also features an EEPROM memory for data storage. All three memory spaces are linear and regular. The Atmega8 chip has 8kB of on-chip in-system programmable flash memory for program storage, and is organized as 64K x 6 bits. AVR processors do not have a cache memory. Instruction Set: All AVR instructions are either 6 bits or bits wide. This is the reason why the flash is orga-

5 nized as 64K x 6. The AVR instruction set supports a variety of addressing schemes for a total of thirteen different addressing modes. All operations are integer-based. The AVR instruction set does not support floating-point operations but requires their emulation in software. All instructions take either one or two cycles to execute, except certain types of instructions, and their timing is listed in Table. Instructions Function Exec. Cycles rcall, icall, call Subroutine calls /4 * eicall Extended indirect call 4 ret, reti subroutine returns 4/5 * cpse compare, skip if equal // * sbrc, sbrs, sbic, sbis skip if bit is set/clear // * brxy conditional branches / * lpm, elpm load program memory Table. Timing of AVR Instructions: (* denotes variable execution times) The instruction set also has a rich set of conditional branch instructions. All conditional branch instructions either take two registers as operands or operate based on the status of flag bits. It also has certain skip instructions, which decide whether to skip the next instruction or not based on whether certain flags are set or clear. All load/store instructions directly access memory. Hence, they take a fixed number of cycles. We assume that every memory reference is for a reference in the internal memory and, hence, takes two cycles to complete. This assumption is consistent with the programming environment for the Atmel Atmega family. Thus, the memory latency is bounded and constant as opposed to other conventional processors. This makes the task of Timing Analysis easier. We see in Table that certain instructions take a variable number of cycles. The call instructions take four or five cycles based on the PC size used. For a 6 bit PC, these instructions take 4 cycles; for a bit PC, they take 5 cycles. The same overhead accounting applies to the ret instruction. The cpse, sbrc, etc. instructions use a varying number of cycles based on two factors: Result of evaluation of the condition (true/false) and the size of the instruction to be skipped. For example, if the condition is false, then they take one cycle. If it is false and the instruction to be skipped is one word in size, then they take two cycles. And if it is false and the instruction to be skipped is two words long, then they take cycles. All branch instructions take either one or two cycles, based on whether the branch is not taken, or taken, respectively. The AVR instruction set also includes many logical as well as bit-operational instructions. Timing Analysis for AVR Processors: Table illustrates that only certain instruction types exhibit timing characteristics that are different from the majority. Most other instructions either take one or two cycles to execute. For these instructions, we can divide them into instructions that take one cycle and instructions that take two cycles. Once we determine which of these two categories an instruction falls into, instructions of the same category can be treated identically and the timing added up. For the instructions in Table, special handling is required, particularly for those that have variable execution times. These instructions are handled after path analysis, during the path-merging stage, as explained in Section. Since the AVR processors do not have a cache, all instructions are treated as if they were hits in terms of their timing behavior. Since the memory latency is fixed, we see that this simplification captures the behavior of the system accurately. Loop bounds for the various loops, if any, in the programs are provided as input to the timing analysis framework. A description of the simple pipeline of the AVR processors was also provided as input to the framework to port the pipeline simulator of the timing analyzer to the AVR processor family. 5. Experimental Setup We have used a three-tiered experimental setup to perform and validate our results. To our knowledge, this is the first time that such an approach has been utilized. The first set of experiments was carried out on the MICA Berkeley Sensor nodes [6], which utilize the Atmel Atmega 8L micro-processor. The experiments were conducted by providing worst-case inputs to the programs. The second set of experiments was carried out on a cycle-accurate simulator, AVR Studio, supplied by the processor manufacturer. The final set of experiments was based on static analysis with the timing analysis framework. We chose five benchmarks from the C-Lab embedded WCET suite for the experiments [5], which is widely used for assessing the capability of timing analysis tools. We further analyzed three NesC programs, one synthetic benchmark and two commands from the TinyOS security layer, as depicted in Table. The ArraySum benchmark (NesC) is a synthetic benchmark that calculates the sum of the elements of an integer array. RC5.encrypt and RC5.decrypt are the RC5 encryption and decryption functions, respectively, part of the SPINS security layer of TinyOS [4]. Section 6 provides details on timing analysis for NesC. All worst-case measurements are expressed in terms of processor cycles. To obtain accurate timing results on the hardware, we used interrupt-driven routines activated by hardware counter overflows. At the start of the execution

6 C Benchmark Function sum array Sum and count of positive and negative numbers in an array. fibcall Generate the n th Fibonacci number. insertsort Implementation of Insertion Sort. matrix mult Matrix Multiplication. bubble sort Implementation of Bubble Sort. NesC Benchmark Function ArraySum Sum of numbers in an array. RC5.encrypt RC5 encryption RC5.decrypt RC5 decryption Table. Benchmarks used in Experiments of the benchmark, we initialize a pair of 6-bit counters to zero and allow one of them to increment by one on every cycle (cycle counter). If this counter overflows, we increment the second counter (overflow counter) and reset the first counter. Thus, when execution of the benchmark completes, the combination of the two counters gives us the total time for the benchmark. Of course, considerations have to be made for the overheads of the timing code itself. We explain this in some detail below: Let O = overhead for starting and stopping timers [cycles] O = overhead per invocation of overflow handler [cycles] x = value of cycle counter y = value of overflow counter We note that an overflow occurs once every 65,56 cycles, because we use a 6 bit counter as the cycle counter. Then, the total execution time for the benchmark is obtained as follows: total time = y 6 + x Accounting for the start and stop overhead, we get: wcet = total time O Now, accounting for the overflow handling overhead, we obtain our final WCET estimate: wcet = wcet (y O ) We account for the overhead of timing code in results for the hardware experiments. Once this has been taken care of, the code used on all three platforms is identical. The cycle-accurate simulator has its own interactive GUI, which provides various processor statistics, such as execution cycles and execution time. To obtain timing information from the simulator, we first feed it the code identical to that used for the hardware experiments. We then set breakpoints before and after significant points in the code and calculate the difference in the execution time, as shown on the GUI, to obtain the WCET (or, more precisely, the WCEC in this case). For both of the above, worst-case input sets were manually determined for the execution so that the programs may exhibit worst-case behavior. Our final set of experiments was to run the benchmarks through the timing analysis framework. The same assembly files that were executed on the hardware and the simulator was provided to the timing analysis framework along with the loop bounds for the various loops in the code. The instruction categorizations were hardwired as always hits. The control-flow information was extracted by a preprocessing tool, similar to a compiler back-end, which is part of the framework. The results, along with pipeline information, were fed into the timing analyzer, which decomposed the program into paths and performed WCET analysis as explained in Section. Results from all three sets of experiments were tabulated. The results themselves and related analysis are provided in the next section. We also conducted experiments to study the effects of varying input sets and loop bounds on the WCET estimates produced by the timing analysis framework. Instructions Extraneous to Loop Bodies: The timing analysis framework calculates the WCET for loops, functions and the main program. It does not provide WCET estimates for single instructions or arbitrary regions of code. The timing obtained from the hardware and the simulator can be for any arbitrary piece of code. Hence, there can be a mismatch between the results obtained from the timing analyzer and the hardware and/or the cycle-accurate simulator. Even with careful placement of the timing code for the hardware and breakpoints for the simulator, differences can occur. The main reason for the mismatch is code that is extraneous to the loops but still integral to the execution, such as loop initialization code, code inserted to pass arguments to functions, and so on, as shown in Figure 4. Care must be taken to adjust for the timing of these extra instructions in the final results obtained from the hardware/simulator. A similar problem is given by the overhead of reading the timer and handling counter overflow exceptions on the actual hardware, which incurs overhead. The former poses only a small, constant overhead while the latter overhead aggregates over longer executions. We explicitly compensate for both of these effects when comparing timings from the actual hardware with the cycle-accurate simulator or the timing analyzer. 6. Timing Analysis for NesC Typical NesC programs include a variety of constructs such as commands, events, tasks, etc. Even though our following description for statically deriving WCET bounds focuses on commands, the methodology is equally applicable to the timing of the body of events once these events have been triggered, as is required by schedulability analysis of synchronous and asynchronous activities in real-time systems [0]. NesC commands are analogous to C functions in that they execute code synchronously. Typically, the NesC compiler converts given NesC code to intermediate C code

7 StartTimer() Loop Initialisation Code label: Check Condition Retun to label if Condition StopTimer() WCET Estimated for this Region of Code Figure 4. Timing Adjustment for Extra Cycles (Bold: Code only Present when Timing Executions on Hardware) and then builds an executable for the AVR motes. Since we deal only with synchronous commands, the C code generated can be examined and information, such as loop bounds, can easily be obtained. The resulting intermediate AVR assembly code and loop bounds information obtained from the C file provide the inputs to the static timing analysis framework (see Section ). For actual execution on the hardware, we again add calls to our hardware timer routines in the NesC code so that we obtain accurate execution times on the hardware. Adjustments are made for the overhead of calling the timing routines. To facilitate the timing NesC code, we have created a simple interface in NesC, which provides an abstraction for executing the actual benchmark. We time the call to the execute() method of the interface. Any code that executes within this method or is called from this method is timed, both in hardware and by the timing analyzer. Since we abstract from the actual benchmark, we can time various NesC modules as long as they implement a facility similar to the execute() interface and contain commands. 7. Results The results from the three sets of experiments are summarized in Table. The execution times for these benchmarks range from a few hundred cycles to several million cycles. Results for the Mica Motes show execution cycles for the benchmarks before handling of the loop-extraneous instructions and after the extra cycles have been accounted for in columns titled Before Adjustment and After adjustment under the Mica Motes results, respectively. The upper part of Table shows the results for the C benchmarks whereas the lower part of the table shows results from the execution and analysis of NesC benchmarks as explained in Section 6. The column titled Before Adjustment under Simulator, depicts the raw results from the simulator whereas the column titled After Adjustment takes the same overheads as before into account to adjust the raw numbers. The column Ratio under Simulator depicts the ratio between adjusted simulator and adjusted Mica results. The remaining columns depict the results of the timing analyzer. They demonstrate increasingly tight WCET estimates as various special cases were handled (see Section ). The initial results in columns titled Initial Results and Ratio under Timing Analyzer indicate the WCET estimates obtained from the timing analyzer before any of the special cases were handled, both in cycles and as a ratio relative to the adjusted Mica numbers. The column titled After Pipeline Fix and the following Ratio column show the results after adjacent loop iteration were adjusted to remove pipeline stalls between paths (see Subsection.). Finally, the column After Variable Instruction Fix and the succeeding Ratio column report the WCET estimate obtained from the timing analysis framework obtained after the timing analyzer s path-merging logic was enhanced to account for instructions that have variable execution times (see Subsection.). All Ratios indicate the ratio of the execution cycles in the preceding column to the corresponding entries of execution cycles After Adjustment for Mica Motes. Let us first focus on the results regarding the timing analyzer. The original timing analyzer framework reported WCET estimates for the AVR architecture. However, these results were not tight. By studying the architecture and the instruction set, we were able to identify a number of cases (see Section ) that required enhancements of the timing analyzer. After adding the logic to ensure proper overlapping of adjacent loop iterations, WCET estimates became much tighter (Column After Pipeline Fix Table ). However, the WCET estimates were still being over-estimated. After addressing shortcomings in the handling of instructions with variable execution times, the final results (Column After Variable Instruction Fix of Table ) show very tight and close estimates of the WCET compared to the execution numbers obtained from the hardware. Not only were we able to obtain tight WCET estimates for the AVR architecture, we also enhanced the capabilities and the value of our timing analysis framework in the process. The execution on the Mica hardware resulted in nearly identical results to the cycle-accurate simulator. However, we see that the simulator may slightly underestimate the WCET, which, however small, is not safe in a real-time environment. This differences in timing between the hardware and the simulator is fully repeatable. Hence, the cycleaccurate simulator is not entirely fit for usage in the context of real-time schedulability analysis as a base to obtain the WCET of tasks. Hence, we confirm unsafe differences between simulators and hardware reported in prior work,

8 C Benchmark Mica Motes Simulator Timing Analyzer Before After Before After Ratio Initial Ratio After Ratio After Var. Ratio Adjustment Adjustment Adjustment Adjustment Results Pipeline Fix Instr. Fix sum array 4,54 4,500 4,5 4, , , , fibcall insertsort,69,6,6, ,978., matrix mult,85,845,848, , ,878.0 bubble sort,68,49,68,9,68,49,68,9.00,900,998.08,650,000.0,776,58.04 NesC Benchmark Mica Motes Simulator Timing Analyzer Before After Before After Ratio Initial Ratio After Ratio After Var. Ratio Adjustment Adjustment Adjustment Adjustment Results Pipeline Fix Instr. Fix) ArraySum RC5.encrypt 5,956 5,95 5,95 5, ,958. 6, , RC5.decrypt 5,860 5,855 5,855 5, ,98. 6,.0 6,.0 Table. Table of Results from all Experiments. which has been attributed most likely to minor omissions in the accuracy of architectural model within simulators []. The results also demonstrate that the timing analyzer is able to tightly and safely bound the WCETs for the benchmarks tested. We see that for most of the benchmarks the WCET estimates closely match the execution times obtained from the hardware and the simulator. For three of the five benchmarks, the timing analyzer safely overestimates by less than % relative to the actual execution. Furthermore, timing analysis for NesC code is accurate. The results from the timing analyzer match hardware execution times very closely. Differences are within 9% for the synthetic array summation benchmark and nearly identical (% or less) for the encryption and decryption algorithms. For the experiments with varying input sizes, we conducted experiments to assess the scalability of our approach for the fibcall benchmark. We compare results from the timing analyzer against results from the hardware, as depicted in Figure 5. Figure 5. Scalability of Timing Analyzer for Varying Input Sizes The figure shows that the tightness of WCET bounds does not deteriorate with increasing input sizes. We observe a constant, small overestimation of the actual execution times by the WCET estimates. Hence, our framework is not only reliable and tight for a wide range of programs, but it is also independent of varying input sizes and the resulting changes in the number of iterations. The three-pronged approach to performing these experiments underscores the reliability of our framework. This also shows the need for verifying timing analyzers against either actual hardware or cycle-accurate simulators, or even both. In fact, a comparison against only a cycle-accurate simulator can sometimes lead to slight underestimations, as seen in the results. Hence, we favor a comparison with the actual hardware over a cycle-accurate simulator, even if the latter is supplied by the manufacturer. Overall, the WCET bounds obtained by our timing analyzer are both safe and tight as they closely follow actual execution times. Let us discuss the merits of the three approaches in a more general sense. While obtaining cycles from the actual hardware provides the actual data needed, cycle level simulators can also be useful. First, problems with a timing analyzer can be compared to diagnostic output from a simulator to track down problems regarding inaccurate estimations. In contrast, only limited information regarding the execution can be typically obtained from hardware, such as the number of cycles. Second, simulators are often easier to interact with than actual hardware. Tests are typically easier to setup and devise with a simulator. Finally, sometimes the actual hardware may not be available. A simulator may be executed on different general-purpose processors. Ideally, a cycle-accurate simulator should be used to verify the accuracy of a timing analyzer and the actual hardware should be used to verify the accuracy of the simulator. But our study shows that one should not blindly rely on so-called cycle-accurate simulators. 8. Related Work Many research groups have addressed the the problem of timing analysis. Methods of analysis range from unoptimized programs on simple CISC processors over optimized

9 programs on pipelined RISC processors and uncached architectures to instruction and data caches [, 5,, 8, 5,,, 7]. In fact, our work is probably most closely related to that of Harmon et al. [] in terms of comparing the actual hardware. However, instead of using time-consuming reverse engineering methods (requiring error-prone data acquisition with, e.g., an oscilloscope), we demonstrate that data sheets together with observing actual hardware may be sufficient for the design of a timing analyzer. These methods are used to obtain safe bounds on the WCET as discrete values in a non-parametric fashion. Parametric timing analysis by Vivancos et al. describes techniques to handle variable loop bounds [8]. This allows dynamic schedulers to re-assess the WCET based on loop bounds determined dynamically during program execution. In independent work, path expressions were used to combine a source-oriented parametric approach of WCET analysis with timing annotations, verifying the latter with the former, in work by Chapman et al. [6]. Bernat and Burns recently proposed algebraic expressions to represent the WCET of programs []. The FAST framework by Seth et al. models processor frequency and incorporates parametric timing into the analysis framework [6]. Probabilistic and experimental approaches to obtaining the WCET for programs have also been proposed [0, 4]. Our timing analyzer framework builds on the concepts of prior work [,, 4,, 5]. Modifications and enhancements have been made to the framework to work with various different architectures and instructions sets. The AiT tool [7], loosely based on our early work, particularly that on caching, is a commercially available timing analysis tool that has been shown to provide tighter estimates for the Motorola Coldfire architecture employed in Airbus planes. Chen et al. recently presented methods to perform static timing analysis for embedded software [7]. They utilize Integer Linear Programming concepts to obtain the WCET for an embedded architecture. They cope with more complex architectures supporting branch prediction and predication, which are not issues in the AVR architecture. In other related work, Engblom et al. target a line of custom ASICs based on a standard CPU core based on the Atmega90 line [9]. Prior work handles pipeline effects and effects of variable-cycle instructions in a different manner [8]. In that work, the timing of any single instruction provides a baseline. Then, all possible subsequences of instructions of different lengths are examined to obtain savings in execution time due to instruction overlap in the pipeline. These savings are expressed as negative delta values for any instruction sequence of length two, three etc. This method, although exhaustive, is extremely time-consuming. In contrast, our work shows that it is sufficient to capture edgeeffects between basic blocks, and only if they occur, as in case of the existence of variable-cycle instructions. Hence, our method is considerably more efficient in terms of analysis overhead. 9. Conclusion In this work, we enhanced existing tools and developed new tools to incorporate them into a framework capable of performing timing analysis with the aim of obtaining WCET estimates for the Atmel Atmega architecture family. The Atmel AVR architecture and instruction set was targeted for static timing analysis as this architecture is widely used in the Berkeley Motes architecture an architecture that is becoming ever more popular for use in a wide variety of applications. The timing analysis framework is capable of statically analyzing code compiled for the Atmel processors. It provides tight, safe, and accurate WCET estimates, which forms the base for subsequent real-time schedulability analysis. To verify our results, we performed a series of experiments using a three-tiered approach, which is unique, to the best of our knowledge. Timing results from the actual hardware, from a cycle-accurate simulator and from our timing analysis framework were studied and compared. We found that the cycle-accurate simulator is capable of underestimating the WCET at times. Hence, it does not provide a safe base for WCET calculations. In contrast, our timing analysis framework was able to calculate WCET cycles with an over-estimation of less than % for programs with statically known loop bounds. We address the handling of variable-cycle instructions, path merging without pipeline stalls and accurate accounting for timing overhead. Our approach extends beyond the Atmel AVR family as the modeled architectural features are common in low-end embedded processors. By enhancing the entire framework, significantly tighter bounds were obtained by our timing analysis framework. Specifically, the enhanced pipeline modeling improves the accuracy of WCET bounds by 5 to 0%. Our effient modeling of variable-cycle instructions further improves the accuracy by 0 to 40% for total improvements of up to 77%. Our framework is also capable of performing static timing analysis for NesC commands. The WCET results obtained for both NesC and C code are accurate as well as tight. This shows that our framework is flexible enough to bound the WCET of sensor network applications in NesC as well as C for the Atmel architecture, both at the level required by schedulability analysis of real-time systems. To our knowledge, this is the first time that such a three-tiered assessment of timing experiments has been carried out. It demonstrates short-comings in so-called cycleaccurate simulators while static timing analysis provided safe bounds on the WCET. Furthermore, making our timing analysis toolset available to sensor node applications may be a significant contribution towards addressing a documented need for tool support for EmNets.

10 References [] A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller. Virtual simple architecture (VISA): Exceeding the complexity limit in safe real-time systems. In International Symposium on Computer Architecture, pages 50 6, June 00. [] Atmel. Atmel avr 8-bit risc family. [] G. Bernat and A. Burns. An approach to symbolic worst-case execution time analysis. In 5th IFAC Workshop on Real- Time Programming, May 000. [4] G. Bernat, A. Colin, and S. Petters. WCET analysis of probabilistic hard real-time systems. In IEEE Real-Time Systems Symposium, Dec. 00. [5] C-Lab. WCET benchmarks. Available from [6] R. Chapman, A. Burns, and A. Wellings. Combining static worst-case timing analysis and program proof. Real-Time Systems, ():45 7, 996. [7] K. Chen, S. Malik, and D. I. August. Retargetable static timing analysis for embedded software. In Proceedings of the International Symposium on System Synthesis (ISSS), October 00. [8] J. Engblom. Processor pipelines and static worst-case execution time analysis. Uppsala Dissertations from the Faculty of Science and Technology, 00. [9] J. Engblom, A. Ermedahl, M. Sjdin, J. Gustafsson,, and H. Hansson. Execution-time analysis for embedded real-time systems. In STTT (Software Tools for Technology Transfer) special issue on ASTEC., 00. [0] D. Estrin, G. Borriello, R. Colwell, J. Fiddler, M. Horowitz, W. Kaiser, N. Leveson, B. Liskov, P. Lucas, D. Maher, P. Mankiewich, R. Taylor, and J. W. (eds.). Embedded, Everywhere, A Research Agenda for Networked Systems of Embedded Computers. Computer Science and Telecommunications Board, National Academies Press, 00. [] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler. The NesC language: A holistic approach to networked embedded systems. In J. J. B. Fenwick and C. Norris, editors, Proceedings of the ACM SIGPLAN 00 Conference on Programming Language Design and Implementation (PLDI-0), volume 8, 5 of ACM SIGPLAN Notices, pages, New York, June ACM Press. [] M. Harmon, T. P. Baker, and D. B. Whalley. A retargetable technique for predicting execution time. In IEEE Real-Time Systems Symposium, pages 68 77, Dec. 99. [] C. A. Healy, R. D. Arnold, F. Mueller, D. Whalley, and M. G. Harmon. Bounding pipeline and instruction cache performance. IEEE Transactions on Computers, 48():5 70, Jan [4] C. A. Healy, M.. Sjödin, and D. B. Whalley. Bounding loop iterations for timing analysis. In IEEE Real-Time Embedded Technology and Applications Symposium, pages, June 998. [5] C. A. Healy, D. B. Whalley, and M. G. Harmon. Integrating the timing analysis of pipelining and instruction caching. In IEEE Real-Time Systems Symposium, pages 88 97, Dec [6] J. Hill, R. Szewczyk, A. Woo, D. Culler, S. Hollar, and K. Pister. System architecture directions for networked sensors. In Architectural Support for Programming Languages and Operating Systems, pages 9 04, 000. [7] Y.-T. S. Li, S. Malik, and A. Wolfe. Cache modeling for realtime software: Beyond direct mapped instruction caches. In IEEE Real-Time Systems Symposium, pages 54 6, Dec [8] S.-S. Lim, Y. H. Bae, G. T. Jang, B.-D. Rhee, S. L. Min, C. Y. Park, H. Shin, and C. S. Kim. An accurate worst case timing analysis for RISC processors. In IEEE Real-Time Systems Symposium, pages 97 08, Dec [9] C. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. of the Association for Computing Machinery, 0():46 6, Jan. 97. [0] J. Liu. Real-Time Systems. Prentice Hall, 000. [] S. Montan and J. Engblom. Validation of a cycle-accurate cpu simulator against real hardware. In ECRTS (Euromicro Conference on Real-Time Systems), 000. [] F. Mueller. Timing analysis for instruction caches. Real-Time Systems, 8(/):09 9, May 000. [] C. Y. Park. Predicting program execution times by analyzing static and dynamic program paths. Real-Time Systems, 5(): 6, Mar. 99. [4] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. D. Tygar. SPINS: Security protocols for sensor networks. In Seventh Annual International Conference on Mobile Computing and Networks (MobiCOM 00), pages 89 99, 00. [5] P. Puschner and C. Koza. Calculating the maximum execution time of real-time programs. Real-Time Systems, ():59 76, Sept [6] K. Seth, A. Anantaraman, F. Mueller, and E. Rotenberg. Fast: Frequency-aware static timing analysis. In IEEE Real-Time Systems Symposium, pages 40 5, Dec. 00. [7] S. Thesing, J. Souyris, R. Heckmann, F. R. andm. Langenbach, R. Wilhelm, and C. Ferdinand. An Abstract Interpretation-Based Timing Validation of Hard Real-Time Avionics. In Proceedings of the International Performance and Dependability Symposium (IPDS), June 00. [8] E. Vivancos, C. Healy, F. Mueller, and D. Whalley. Parametric timing analysis. In ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Embedded Systems, volume 6 of ACM SIGPLAN Notices, pages 88 9, Aug. 00. [9] D. L. Weaver and T. Germond. The SPARC Architecture Manual Version 9. Prentice Hall, 994. [0] J. Wegener and F. Mueller. A comparison of static analysis and evolutionary testing for the verification of timing constraints. Real-Time Systems, ():4 68, Nov. 00. [] R. White, F. Mueller, C. Healy, D. Whalley, and M. Harmon. Timing analysis for data caches and set-associative caches. In IEEE Real-Time Embedded Technology and Applications Symposium, pages 9 0, June 997. [] R. T. White, F. Mueller, C. Healy, D. Whalley, and M. G. Harmon. Timing analysis for data and wrap-around fill caches. Real-Time Systems, 7(/):09, Nov. 999.

Parametric Timing Analysis

Parametric Timing Analysis Parametric Timing Analysis EMILIO VIVANCOS 1,CHISTOPHER HEALY 2,FRANK MUELLER 3,AND DAVID WHALLEY 4 1 Departamento de Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, 46022-Valencia,

More information

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Jan Gustafsson Department of Computer Engineering, Mälardalen University Box 883, S-721 23 Västerås, Sweden jangustafsson@mdhse

More information

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 41 of 49 Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

More information

Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations

Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing

More information

History-based Schemes and Implicit Path Enumeration

History-based Schemes and Implicit Path Enumeration History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs To appear in: Int. Conf. on Parallel and Distributed Systems, ICPADS'96, June 3-6, 1996, Tokyo Allowing Cycle-Stealing Direct Memory Access I/O Concurrent with Hard-Real-Time Programs Tai-Yi Huang, Jane

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Automatic flow analysis using symbolic execution and path enumeration

Automatic flow analysis using symbolic execution and path enumeration Automatic flow analysis using symbolic execution path enumeration D. Kebbal Institut de Recherche en Informatique de Toulouse 8 route de Narbonne - F-62 Toulouse Cedex 9 France Djemai.Kebbal@iut-tarbes.fr

More information

Handling Cyclic Execution Paths in Timing Analysis of Component-based Software

Handling Cyclic Execution Paths in Timing Analysis of Component-based Software Handling Cyclic Execution Paths in Timing Analysis of Component-based Software Luka Lednicki, Jan Carlson Mälardalen Real-time Research Centre Mälardalen University Västerås, Sweden Email: {luka.lednicki,

More information

FSU DEPARTMENT OF COMPUTER SCIENCE

FSU DEPARTMENT OF COMPUTER SCIENCE mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida Predicting Instruction Cache Behavior Frank Mueller () David Whalley () Marion Harmon(FAMU) Tallahassee, FL

More information

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS Peter Puschner and Raimund Kirner Vienna University of Technology, A-1040 Vienna, Austria {peter, raimund}@vmars.tuwien.ac.at Abstract Keywords:

More information

Supporting the Specification and Analysis of Timing Constraints *

Supporting the Specification and Analysis of Timing Constraints * Supporting the Specification and Analysis of Timing Constraints * Lo Ko, Christopher Healy, Emily Ratliff Marion Harmon Robert Arnold, and David Whalley Computer and Information Systems Dept. Computer

More information

Bounding Worst-Case Data Cache Behavior by Analytically Deriving Cache Reference Patterns

Bounding Worst-Case Data Cache Behavior by Analytically Deriving Cache Reference Patterns Bounding Worst-Case Data Cache Behavior by Analytically Deriving Cache Reference Patterns Harini Ramaprasad, Frank Mueller Dept. of Computer Science, Center for Embedded Systems Research North Carolina

More information

Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis

Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis Erik Yu-Shing Hu, Andy Wellings and Guillem Bernat Real-Time Systems Research Group Department of Computer Science

More information

Timing Anomalies and WCET Analysis. Ashrit Triambak

Timing Anomalies and WCET Analysis. Ashrit Triambak Timing Anomalies and WCET Analysis Ashrit Triambak October 20, 2014 Contents 1 Abstract 2 2 Introduction 2 3 Timing Anomalies 3 3.1 Retated work........................... 4 3.2 Limitations of Previous

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Transforming Execution-Time Boundable Code into Temporally Predictable Code

Transforming Execution-Time Boundable Code into Temporally Predictable Code Transforming Execution-Time Boundable Code into Temporally Predictable Code Peter Puschner Institut for Technische Informatik. Technische Universitdt Wien, Austria Abstract: Traditional Worst-Case Execution-Time

More information

signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

signature i-1 signature i instruction j j+1 branch adjustment value if - path initial value signature i signature j instruction exit signature j+1 CONTROL FLOW MONITORING FOR A TIME-TRIGGERED COMMUNICATION CONTROLLER Thomas M. Galla 1, Michael Sprachmann 2, Andreas Steininger 1 and Christopher Temple 1 Abstract A novel control ow monitoring scheme

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors *

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * Hsin-Ta Chiao and Shyan-Ming Yuan Department of Computer and Information Science National Chiao Tung University

More information

Chapter 14 - Processor Structure and Function

Chapter 14 - Processor Structure and Function Chapter 14 - Processor Structure and Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 14 - Processor Structure and Function 1 / 94 Table of Contents I 1 Processor Organization

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System Daniel Sandell Master s thesis D-level, 20 credits Dept. of Computer Science Mälardalen University Supervisor:

More information

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 1, JANUARY 2003 1 Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of

More information

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow

More information

TOSSIM simulation of wireless sensor network serving as hardware platform for Hopfield neural net configured for max independent set

TOSSIM simulation of wireless sensor network serving as hardware platform for Hopfield neural net configured for max independent set Available online at www.sciencedirect.com Procedia Computer Science 6 (2011) 408 412 Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science

More information

Worst-Case Execution Time Analysis for Dynamic Branch Predictors

Worst-Case Execution Time Analysis for Dynamic Branch Predictors Worst-Case Execution Time Analysis for Dynamic Branch Predictors Iain Bate and Ralf Reutemann Department of Computer Science, University of York York, United Kingdom e-mail: {ijb,ralf}@cs.york.ac.uk Abstract

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Eliminating Stack Overflow by Abstract Interpretation

Eliminating Stack Overflow by Abstract Interpretation Eliminating Stack Overflow by Abstract Interpretation Paper by John Regehr, Alastair Reid, and Kirk Webb ACM Transactions on Embedded Computing Systems, Nov 005 Truong Nghiem nghiem@seas.upenn.edu Outline

More information

An Approach to Task Attribute Assignment for Uniprocessor Systems

An Approach to Task Attribute Assignment for Uniprocessor Systems An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk

More information

Static WCET Analysis: Methods and Tools

Static WCET Analysis: Methods and Tools Static WCET Analysis: Methods and Tools Timo Lilja April 28, 2011 Timo Lilja () Static WCET Analysis: Methods and Tools April 28, 2011 1 / 23 1 Methods 2 Tools 3 Summary 4 References Timo Lilja () Static

More information

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection Daniel Grund 1 Jan Reineke 2 1 Saarland University, Saarbrücken, Germany 2 University of California, Berkeley, USA Euromicro

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS

A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS Adriano José Tavares Department of Industrial Electronics, University of Minho, 4800 Guimarães, Portugal e-mail: atavares@dei.uminho.pt,

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Tightening the Bounds on Feasible Preemption Points

Tightening the Bounds on Feasible Preemption Points Tightening the Bounds on Feasible Preemption Points Harini Ramaprasad, Frank Mueller North Carolina State University,Raleigh, NC 27695-7534, mueller@cs.ncsu.edu Abstract Caches have become invaluable for

More information

Major and Minor States

Major and Minor States Major and Minor States We now consider the micro operations and control signals associated with the execution of each instruction in the ISA. The execution of each instruction is divided into three phases.

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Experimental Node Failure Analysis in WSNs

Experimental Node Failure Analysis in WSNs Experimental Node Failure Analysis in WSNs Jozef Kenyeres 1.2, Martin Kenyeres 2, Markus Rupp 1 1) Vienna University of Technology, Institute of Communications, Vienna, Austria 2) Slovak University of

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Improving Timing Analysis for Matlab Simulink/Stateflow

Improving Timing Analysis for Matlab Simulink/Stateflow Improving Timing Analysis for Matlab Simulink/Stateflow Lili Tan, Björn Wachter, Philipp Lucas, Reinhard Wilhelm Universität des Saarlandes, Saarbrücken, Germany {lili,bwachter,phlucas,wilhelm}@cs.uni-sb.de

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

A Single-Path Chip-Multiprocessor System

A Single-Path Chip-Multiprocessor System A Single-Path Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Institute of Computer Engineering Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Predictable paging in real-time systems: an ILP formulation

Predictable paging in real-time systems: an ILP formulation Predictable paging in real-time systems: an ILP formulation Damien Hardy Isabelle Puaut Université Européenne de Bretagne / IRISA, Rennes, France Abstract Conventionally, the use of virtual memory in real-time

More information

FSU DEPARTMENT OF COMPUTER SCIENCE

FSU DEPARTMENT OF COMPUTER SCIENCE mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida On Debugging RealTime Applications Frank Mueller and David Whalley Tallahassee, FL 323044019 email: On Debugging

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach FIFO Cache Analysis for WCET Estimation: A Quantitative Approach Abstract Although most previous work in cache analysis for WCET estimation assumes the LRU replacement policy, in practise more processors

More information

Modeling Control Speculation for Timing Analysis

Modeling Control Speculation for Timing Analysis Modeling Control Speculation for Timing Analysis Xianfeng Li, Tulika Mitra and Abhik Roychoudhury School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543. Abstract. The

More information

Tuning the WCET of Embedded Applications

Tuning the WCET of Embedded Applications Tuning the WCET of Embedded Applications Wankang Zhao 1,Prasad Kulkarni 1,David Whalley 1, Christopher Healy 2,Frank Mueller 3,Gang-Ryung Uh 4 1 Computer Science Dept., Florida State University, Tallahassee,

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 102. During a write operation if the required block is not

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Introduction to Real-time Systems. Advanced Operating Systems (M) Lecture 2

Introduction to Real-time Systems. Advanced Operating Systems (M) Lecture 2 Introduction to Real-time Systems Advanced Operating Systems (M) Lecture 2 Introduction to Real-time Systems Real-time systems deliver services while meeting some timing constraints Not necessarily fast,

More information

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS 1.Define Computer Architecture Computer Architecture Is Defined As The Functional Operation Of The Individual H/W Unit In A Computer System And The

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

UNIVERSITY OF MORATUWA CS2052 COMPUTER ARCHITECTURE. Time allowed: 2 Hours 10 min December 2018

UNIVERSITY OF MORATUWA CS2052 COMPUTER ARCHITECTURE. Time allowed: 2 Hours 10 min December 2018 Index No: UNIVERSITY OF MORATUWA Faculty of Engineering Department of Computer Science & Engineering B.Sc. Engineering 2017 Intake Semester 2 Examination CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

Data-Flow Based Detection of Loop Bounds

Data-Flow Based Detection of Loop Bounds Data-Flow Based Detection of Loop Bounds Christoph Cullmann and Florian Martin AbsInt Angewandte Informatik GmbH Science Park 1, D-66123 Saarbrücken, Germany cullmann,florian@absint.com, http://www.absint.com

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

A framework for verification of Program Control Unit of VLIW processors

A framework for verification of Program Control Unit of VLIW processors A framework for verification of Program Control Unit of VLIW processors Santhosh Billava, Saankhya Labs, Bangalore, India (santoshb@saankhyalabs.com) Sharangdhar M Honwadkar, Saankhya Labs, Bangalore,

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Worst-Case Execution Times Analysis of MPEG-2 Decoding

Worst-Case Execution Times Analysis of MPEG-2 Decoding Worst-Case Execution Times Analysis of MPEG-2 Decoding Peter Altenbernd Lars-Olof Burchard Friedhelm Stappert C-LAB PC 2 C-LAB 33095 Paderborn, GERMANY peter@c-lab.de baron@upb.de fst@c-lab.de Abstract

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Timing analysis and timing predictability

Timing analysis and timing predictability Timing analysis and timing predictability Architectural Dependences Reinhard Wilhelm Saarland University, Saarbrücken, Germany ArtistDesign Summer School in China 2010 What does the execution time depends

More information

AVR MICROCONTROLLER ARCHITECTURTE

AVR MICROCONTROLLER ARCHITECTURTE AVR MICROCONTROLLER ARCHITECTURTE AVR MICROCONTROLLER AVR- Advanced Virtual RISC. The founders are Alf Egil Bogen Vegard Wollan RISC AVR architecture was conceived by two students at Norwegian Institute

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

REDUCTION IN RUN TIME USING TRAP ANALYSIS

REDUCTION IN RUN TIME USING TRAP ANALYSIS REDUCTION IN RUN TIME USING TRAP ANALYSIS 1 Prof. K.V.N.Sunitha 2 Dr V. Vijay Kumar 1 Professor & Head, CSE Dept, G.Narayanamma Inst.of Tech. & Science, Shaikpet, Hyderabad, India. 2 Dr V. Vijay Kumar

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - 2014/2015 Von Neumann Architecture 2 Summary of the traditional computer architecture: Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Dissecting Execution Traces to Understand Long Timing Effects

Dissecting Execution Traces to Understand Long Timing Effects Dissecting Execution Traces to Understand Long Timing Effects Christine Rochange and Pascal Sainrat February 2005 Rapport IRIT-2005-6-R Contents 1. Introduction... 5 2. Long timing effects... 5 3. Methodology...

More information

RTC: Language Support for Real-Time Concurrency

RTC: Language Support for Real-Time Concurrency RTC: Language Support for Real-Time Concurrency Insup Lee, Susan Davidson, and Victor Wolfe 1 Introduction The RTC (Real-Time Concurrency) programming concepts and language constructs for expressing timing

More information

Superscalar SMIPS Processor

Superscalar SMIPS Processor Superscalar SMIPS Processor Group 2 Qian (Vicky) Liu Cliff Frey 1 Introduction Our project is the implementation of a superscalar processor that implements the SMIPS specification. Our primary goal is

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

An Operating System for a Time-Predictable Computing Node

An Operating System for a Time-Predictable Computing Node An Operating System for a Time-Predictable Computing Node Guenter Khyo, Peter Puschner, and Martin Delvai Vienna University of Technology Institute of Computer Enginering A1040 Vienna, Austria peter@vmars.tuwien.ac.at

More information

MAKING DYNAMIC MEMORY ALLOCATION STATIC TO SUPPORT WCET ANALYSES 1 Jörg Herter 2 and Jan Reineke 2

MAKING DYNAMIC MEMORY ALLOCATION STATIC TO SUPPORT WCET ANALYSES 1 Jörg Herter 2 and Jan Reineke 2 MAKING DYNAMIC MEMORY ALLOCATION STATIC TO SUPPORT WCET ANALYSES 1 Jörg Herter 2 and Jan Reineke 2 Abstract Current worst-case execution time (WCET) analyses do not support programs using dynamic memory

More information

Estimation of worst case latency of periodic tasks in a real time distributed environment

Estimation of worst case latency of periodic tasks in a real time distributed environment Estimation of worst case latency of periodic tasks in a real time distributed environment 1 RAMESH BABU NIMMATOORI, 2 Dr. VINAY BABU A, 3 SRILATHA C * 1 Research Scholar, Department of CSE, JNTUH, Hyderabad,

More information

Utilizing Concurrency: A New Theory for Memory Wall

Utilizing Concurrency: A New Theory for Memory Wall Utilizing Concurrency: A New Theory for Memory Wall Xian-He Sun (&) and Yu-Hang Liu Illinois Institute of Technology, Chicago, USA {sun,yuhang.liu}@iit.edu Abstract. In addition to locality, data access

More information

An Programming Language with Fixed-Logical Execution Time Semantics for Real-time Embedded Systems

An Programming Language with Fixed-Logical Execution Time Semantics for Real-time Embedded Systems An Programming Language with Fixed-Logical Execution Time Semantics for Real-time Embedded Systems embedded systems perspectives and visions Dr. Marco A.A. Sanvido University of California at Berkeley

More information

Understanding The Effects of Wrong-path Memory References on Processor Performance

Understanding The Effects of Wrong-path Memory References on Processor Performance Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology

More information