Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations

Size: px
Start display at page:

Download "Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations"

Transcription

1 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing DMA I/O Operations Tai-Yi Huang and Jane W.-S. Liu Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL fthuang, May 3, 1995 Abstract This paper describes an ecient algorithm which gives a bound on the worst-case execution times of the concurrent execution of CPU instructions and cycle-stealing DMA I/O operations. Simulations of several programs were conducted to evaluate this algorithm. Compared with the traditional pessimistic approach, the bound on the worst-case execution time produced by the algorithm is signicantly tighter. For a sample program that multiplies two matrices while the I/O bus is fully utilized, our algorithm achieves a 39% improvement in the accuracy of the prediction. 1 Introduction Algorithms for scheduling tasks in hard-real-time systems typically assume that their worst-case execution times are known. Such a system is deigned to ensure that all tasks can complete by their deadlines as long as no task in a system executes longer than its worst-case execution time (WCET). A task which overruns may lead to missed deadlines and the failure of the whole system. For this reason, how to bound the WCET of programs has received a great deal of attention in recent years. Mok et al. [3] developed a graphical tool to analyze the timing behavior of assembly language programs and to bound their WCET. This tool requires that the maximum iteration number of each loop structure be known. Park and Shaw [6, 7] developed a similar method for source-level programs. Their dynamic path analysis method eliminates infeasible execution paths and thus tightens the prediction of the WCET. Pusher and Koza [8] introduced several new language constructs with which programmers can describe the timing behavior of their programs. Their experiment showed that with this valuable information, the gap between the calculated WCET and the real WCET can be reduced signicantly. To predict the WCET of concurrent programs, Niehaus [5] developed a semanticspreserving transformation for concurrent programming language constructs such as critical sections and synchronous communication. Zhang, Burns and Nicholson [11] developed a mathematical model to predict the WCET of programs executed on a two-stage pipeline. Mueller, Whalley and Harmon [4] developed a static cache simulation method to predict instruction cache behavior and bound its worst-case performance. This paper rst analyzes the delay caused by cyclestealing direct-memory access (DMA) I/O activities. It then presents an algorithm to estimate the WCET of the concurrent execution of a stream of CPU instructions and DMA activities. A DMA controller transfers data between the main memory and I/O devices with minimal CPU involvement. As a result, the CPU can execute other instructions while a DMA controller is transferring data. A DMA controller operates either in burst mode or in cycle-stealing mode. A DMA controller in cycle-stealing mode transfers data by \stealing" bus cycles from an executing program. In this way, it retards the progress of the executing program and extends the execution time of the program. A conservative estimate of the WCET of a stream of CPU instructions and a cycle-stealing DMA I/O operation, which are ready at the same time, is the sum of their WCET obtained by assuming that each executes alone. Obviously this estimate is pessimistic. We present here an analysis method and an algorithm which give a tighter bound on the WCET. The performance of the algorithm in terms of the amount of reduction from the most pessimistic WCET estimate is demonstrated by simulation results. The rest of the paper is structured as follow. Section 2 describes the machine model that is the basis of our analysis method. Section 3 presents the method. Section 4 presents an algorithm to implement the method.

2 processor clock I/O Bus BUSY BUSY IDLE BUSY CPU memory read (fetch instruction) memory read (fetch operand) internal operation (execution) memory write (write data) An instruction cycle Figure 1: The instruction cycle of ADD 1, (A0) Our simulation results are presented in Section 5. Finally, Section 6 concludes the paper and discusses future work. 2 The Machine Model We adopt a commonly used machine model according to which an instruction executes in the manner as shown in Figure 1. The sequence of fetch and execution of an instruction is called an instruction cycle. Each instruction cycle is composed of one or more machine cycles. A machine cycle requires one to several processor clock cycles to execute. Dierent machine cycles execute different functions. For example, the instruction cycle of ADD 1,(A0), shown in Figure 1, is composed of four machine cycles: a memory read (bus-access) cycle to fetch the instruction, a memory read (bus-access) cycle to fetch the operand, an execution (no-bus-access) cycle to carry out the addition, followed by a memory write (bus-access) cycle to write back the data. Each machine cycle in this example in turn takes 4, 3, 3, and 4 processor clock cycles to execute. Since we focus on the analysis of how the DMA controller and the CPU contend for the bus, we are concerned primarily with whether the CPU accesses the I/O bus during each machine cycle. Therefore, we classify all machine cycles into two categories: bus-access (B) cycles and execution (E) cycles. B cycles are those machine cycles during which the CPU uses the I/O bus. In contrast, during E cycles, the CPU does not need the bus. In general, there may be several consecutive E cycles in an instruction cycle. We assume that the CPU is a synchronous one: the beginning of each machine cycle is triggered by the processor clock. Our analysis method is applicable only for systems that have no cache memory and no pipeline operation. The DMA controller and the CPU share the same I/O bus, as shown in Figure 2. At any time, either the DMA controller or the CPU, but not both, can hold the bus (i.e., be the master of the bus) and transfer data. We focus on the case where the DMA controller operates in cycle-stealing mode. In this mode, it is allowed to access the bus only when the CPU is in an E cycle. The protocol we use to regulate the bus contention between the DMA controller and the CPU is based on the VMEbus specication [9]. Because this protocol is suf- ciently general, the analysis method presented in Section 3 for bounding the delay caused by cycle-stealing DMA I/O activities is applicable to many other commonly used buses. CPU I/O device DMA controller I/O Bus I/O device Memory Figure 2: The architecture of the machine model To become the bus master, the DMA controller rst sends a bus request. If the CPU is in a B cycle, the DMA controller waits. The CPU releases the bus when it enters an E cycle. After a short delay, while the ownership of bus is transferred from the CPU to the DMA controller, the DMA controller gains the bus and starts its data transfer. We will refer to this delay period as Bus Master transfer Time (BMT). After the DMA controller completes the transfer of each unit of data, the bus controller checks if there is any pending bus request from the CPU. The DMA controller is allowed to continue its next transfer if there is no request from the CPU. Otherwise, the DMA controller releases the bus. The CPU gains the bus after a BMT delay.

3 3 Timing Analysis Generally the DMA controller behaves in the following manner. After sending a bus request, the DMA controller waits when the CPU enters a B cycle from a B cycle, becomes the bus master when the CPU enters an E cycle from a B cycle, continues its transfer as long as the CPU continues to be in E cycles, and releases the bus when it nishes all the data transfers or when the CPU enters a B cycle from an E cycle. Again, whether there is any pending bus request is checked only at the end of each data transfer. The CPU does not gain the bus immediately after it sends a bus request if the DMA controller is currently transferring data. Therefore, the executing program may suer delay, and its completion time is postponed accordingly. Figure 3 illustrates the concurrent execution of the DMA controller and the sequence of machine cycles B n! E 1! E 2! : : :! E k! B n+1. Our analysis method assumes that the number of consecutive E cycles in each instruction is known. As shown in Figure 3, the DMA controller gains the bus when the CPU enters E 1 cycle from B n cycle. It keeps transferring data during the interval from E 1 cycle to E k cycle. The DMA controller releases the bus when the CPU is entering B n+1 cycle. The execution of B n+1 cycle is delayed for (b + BMT), where b is the delay between when the CPU requests the bus and when the request is checked and the DMA controller releases the bus. Again, BMT is the delay between the time when a bus master releases the bus to the time when the next bus master gains the bus. Let m denote the number of data transfers performed by the DMA controller during the k consecutive E cycles, and DT be the amount of time to do each transfer. To calculate m, we let T Ei denote the execution time of the machine cycle E i, and T k = kx i=1 T Ei be the total execution time of the k consecutive E cycles. Based on the facts that 0 b < DT and T k + b = m DT + BMT, we have (m? 1) T k? BMT < DT Therefore we conclude that m = Tk? BMT DT m (1) Bn BMT E 1 E m * DT E k m b d BMT CPU sends a bus request B n+1 Figure 3: The concurrent execution of cycle-stealing DMA I/O and a sequence of E cycles The delay suered by the CPU execution of the sequence of machine cycles is d 0 = m DT + 2 BMT? T k if the DMA controller holds the bus for m data transfers. On the other hand, if less than m data transfers are performed, the delay becomes shorter because the bus is transferred without the delay b. Because of the assumption that each machine cycle is triggered by the processor clock, the machine cycle B n+1 cannot start until the next clock cycle. As a result, the exact delay suered by the CPU execution is at most equal to d d = 0 where T c is the period of a clock cycle. T c T c (2) 4 Bounding the WCET For a given a stream of N CPU instructions, together with a DMA I/O operation that requires M data transfers and is ready at the same time as the stream of CPU instructions, we want to nd the WCET of the concurrent execution, that is, the maximum amount of time required for both the instruction stream and the I/O operation to complete. We now present an algorithm which makes use of the knowledge about how the CPU instructions and the DMA operation interfere one another. By doing so, it gives us a tighter bound on the WCET. Because each instruction begins with a B cycle, no DMA data transfer can take place across two instructions. Consequently, the eects of cycle-stealing on each instruction can be analyzed independently, without considering the other instructions. The algorithm shown

4 in Figure 4 uses Eqs. (1) and (2) to calculate in each instruction what the worst-case delay caused by cyclestealing is and how many data transfers the DMA controller can perform. The information needed by this algorithm includes how many machine cycles each instruction is composed of, the function of each machine cycle, and the execution time of each machine cycle. This information can be obtained from the reference manual provided by the manufacturer of the processor. The algorithm also requires as inputs, two parameters of the bus, BMT and DT. The rest of the algorithm is self-explanatory. Because the delay of each instruction obtained here is the worst-case delay, the value returned by the algorithm is an upper bound of the execution time. On the other hand, because the algorithm accounts for the effect of the concurrent execution of CPU instructions and DMA I/O, the WCET we get should be much tighter than the pessimistic estimate. 5 Simulation Results We now demonstrate the performance of our algorithm by several simulation results. Given a stream of CPU instructions and a DMA I/O operation, we rst use the pessimistic approach to predict its WCET. According to this approach, the predicted value WCET pessi is equal to the sum of the WCET of the instruction stream when it executes alone and the WCET of the DMA I/O operation when the operation is done alone. We next make the prediction by our algorithm. The value returned is denoted by WCET ours. We use the percentage reduction from the pessimistic WCET R = WCET pessi? WCET ours WCET pessi 100% to measure the performance of the algorithm. Table 1 lists the C programs tested in our simulation. Because of the wide use of CPU32 in embedded systems, we compile these programs into assembly language programs of MC68332, one of the MC68300 Family of embedded controllers. We execute these programs in a simulator to obtain their execution traces. The timing information of each instruction in the traces is given by [1]. Since the clock frequency of MC68332 microprocessor is MHz, the period of a clock cycle T c is 60 ns. Assuming that a 0-wait memory is used and the size of data in each DMA transfer is equal to the bus bandwidth, we set DT to 120 ns. At last, BMT is 10 ns. To investigate the relationship between the performance of our algorithm and the fraction of a trace which is overlapped with a DMA I/O operation, four simulations were conducted on each execution trace. For each trace, we generated four DMA I/O operations which Name qsort bubble fft spline gaussian mtxmul correlate mtxmul2 Description a quicksort of 250 elements. a bubble sort of 100 elements. a 128-node Fast Fourier Transform. a cubic spline function of 100 points. a 10x10 Gaussian elimination. a multiplication of 2 10x10 matrices. a correlate function of 500 tracks. a loop-unrolled version of mtxmul. Table 1: The test set of C programs carry out dierent number of data transfers. In particular, we chose the lengths of these DMA I/O operations so that the trace overlapped with each of the DMA I/O operations for either 25%, 50%, 75%, or 100% of the instructions. The case with 25% overlap means that the rst quarter of the trace executed concurrently with the DMA I/O operation while the rest executed alone. Similarly, in the case of 50% (75%) overlap, the last 50% (25%) of the trace executed alone. We then computed the WCET of the concurrent execution of the trace and each of the four DMA I/O operations. Thus, four values of R were obtained. Table 2 gives the results. Column 2 lists the number of instructions in each program trace. Column 4, 5, 6 and 7 list the percentage reduction in the predicted WCET when the rst quarter, the rst half, the rst three quarters, and the whole trace executed concurrently with a DAM I/O operation, respectively. These values of R indicate that compared with the pessimistic approach, our algorithm produces a more accurate prediction of the WCET of a program when the program executes concurrently with a DMA I/O operation for a larger percentage of the time. This conclusion is an expected one since WCET pessi is more pessimistic when the percentage of overlap is larger. We also investigated the relationship between the performance of our algorithm and the computational requirements of programs. We classify all instructions here into two categories: long instructions and short instructions. An instruction is a long one if during its execution, the CPU does not need the bus for 8 processor clock cycles or more. In contrast, during the execution of a short instruction, the CPU never allows any I/O device to have the bus for such a long period. Generally speaking, long instructions require intensive computation, and short instructions are those that do data movement or simple computation. For example, the instructions MULU.W D1,D2 and DIVU.W D2,D0 are long instructions, and MOVE.L (A3)+,D0 and ADD.L D0,D1 are short instructions. Because the delay caused by cycle-stealing on each instruction is bounded by Eq. (2),

5 Input: { the number of CPU instructions, N, and the instructions inst[1], inst[2],: : :, inst[n] in the stream S. { the number of data transfers, M, required in the cycle-stealing DMA I/O operation. { the execution time of each instruction, inst[i].execution time, and its machine cycles, for i = 1; 2; : : : ; N. { BMT and DT, the two parameters of the I/O bus. Output: WCET, the worst-case execution time of the concurrent execution of the instruction stream S and the DMA I/O operation. Procedure: 1. Set WCET and trans, the number of transfers completed, to zero. 2. For i = 1 to N, compute the contribution of inst[i] to WCET and increment WCET by the amount as follow: a. update WCET = WCET + inst[i].execution time. b. if (trans < M) { compute the worst-case delay d suered by inst[i] according to Eq. (2), and update WCET = WCET + d. { compute the number of transfers, m, completed in inst[i] according to Eq. (1), and update trans = trans + m. 3. If the I/O operation is not completed yet, increment WCET by the amount of time (M? trans) DT to complete the last M? trans data transfer alone. Figure 4: An algorithm which gives a tighter bound on the WCET Name Instructions Long R in % executed instructions qsort 23,026 0% bubble 65,726 0% fft 249,107 2% spline 209,837 3% gaussian 47,272 5% mtxmul 36,789 11% correlate 26,543 17% mtxmul2 9,391 22% Table 2: The simulation results

6 the overhead of each DMA transfer in a long instruction is less than that in a short instruction. Consequently, when a trace contains a higher percentage of long instructions, the algorithm produces a larger reduction percentage. We tested programs with dierent computational requirements in the simulation. Column 3 of Table 2 gives the percentage of long instructions in each program trace. We note the value of R increases monotonically with the percentage of long instructions. Among the tested programs, mtxmul2 is obtained by unrolling the whole innermost loop of mtxmul. The loop unrolling procedure signicantly increases the percentage of long instructions in the trace. As a result, our algorithm performs better on the loop-unrolled version: a 39% reduction from the most pessimistic WCET estimate is achieved. 6 Conclusion and Future Work Cycle-stealing DMA operations have often been disabled in real-time systems because of the uncertainty in the amount of time such an operation may delay the completion of an executing program. We presented here an analysis method to determine this delay. Based on the method, we developed an algorithm which gives a tighter bound on the WCET of the concurrent execution of a stream of CPU instructions and a cycle-stealing DMA I/O operation. Simulation results demonstrate that the algorithm can produce more accurate predictions of WCET than the pessimistic WCET estimates, especially when the program contains a large percentage of computation intensive instructions. Our analysis method is applicable only when there is no cache memory. If cache memory is used, the number of bus accesses by the CPU is signicantly reduced because of cache hits. We expect a greater improvement gained by our more accurately accounting for the delay caused by the concurrent execution of cycle-stealing operations. In the future we will extend our analysis to systems in which on-chip cache memory is present. Our work encourages the inclusion of I/O instructions in real-time programs. Because of the hardwaredependent features of I/O instructions, determining their WCET becomes extremely dicult. Traditionally, I/O instructions are not allowed or restricted to appear at the predened areas such as the beginning and end of a program [2] [10]. By decomposing timing related information in a table-driven manner, our work can be used to predict the WCET of a program containing DMA I/O instructions. The future work will build a tool capable of predicting the WCET of programs containing any I/O instruction in a table-driven manner. References [1] MC68000 Family: CPU32 reference manual. Motorola, [2] Mark H. Klein and Thomas Ralya. An analysis of input/output paradigms for real-time systems. Technical Report CMU/SEI-90-TR-19, CMU Software Engineering Institute, July [3] Aloysius K. Mok, Prasanna Amerasinghe, Moyer Chen, and Kamtorn Tantisirivat. Evaluating tight execution time bounds of programs by annotations. In Proceedings of the Sixth IEEE Workshop on Real-Time Operating Systems and Software, pages 272{279, May [4] Frank Mueller, David Whalley, and Marion Harmon. Predicting instruction cache behavior. In ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, June [5] Douglas Niehaus. Program representation and translation for predictable real-time systems. In Proceedings of Real-Time Systems Symposium, pages 53{63, [6] Chang Yun Park. Predicting program execution times by analyzing static and dynamic program paths. Journal of Real-Time Systems, 5:31{62, March [7] Chang Yun Park and Alan C. Shaw. Experiments with a program timing tool based on source-level timing schema. IEEE Computer, pages 48{57, May [8] P. Puschner and C. Koza. Calculating the maximum execution time of real{time programs. Journal of Real-Time Systems, 1:159{176, September [9] The VMEbus Specication. Motorola, [10] A. Vrchoticky and P. Puschner. On the feasibiity of response time predictions{an experimental evaluation. Technical Report 2/91, Institute fur Technische Informatik Technische Universitat Wien, March [11] N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times. Journal of Real-Time Systems, 5:319{343, October 1993.

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs

Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs To appear in: Int. Conf. on Parallel and Distributed Systems, ICPADS'96, June 3-6, 1996, Tokyo Allowing Cycle-Stealing Direct Memory Access I/O Concurrent with Hard-Real-Time Programs Tai-Yi Huang, Jane

More information

Worst-Case Timing Analysis of Cycle-Stealing DMA I/O Tasks. Tai-Yi Huang and Jane W.-S. Liu W. Springeld Ave.

Worst-Case Timing Analysis of Cycle-Stealing DMA I/O Tasks. Tai-Yi Huang and Jane W.-S. Liu W. Springeld Ave. Worst-Case Timing Analysis of Cycle-Stealing DMA I/O Tasks Tai-Yi Huang and Jane W.-S. Liu Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Springeld Ave. Urbana, IL 61801

More information

A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS

A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS Adriano José Tavares Department of Industrial Electronics, University of Minho, 4800 Guimarães, Portugal e-mail: atavares@dei.uminho.pt,

More information

A Tool for the Computation of Worst Case Task Execution Times

A Tool for the Computation of Worst Case Task Execution Times A Tool for the Computation of Worst Case Task Execution Times P. Puschner and A. Schedl Institut fur Technische Informatik Technical University of Vienna Vienna, A-1040 Austria Abstract In modern programming

More information

signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

signature i-1 signature i instruction j j+1 branch adjustment value if - path initial value signature i signature j instruction exit signature j+1 CONTROL FLOW MONITORING FOR A TIME-TRIGGERED COMMUNICATION CONTROLLER Thomas M. Galla 1, Michael Sprachmann 2, Andreas Steininger 1 and Christopher Temple 1 Abstract A novel control ow monitoring scheme

More information

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs

Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Jan Gustafsson Department of Computer Engineering, Mälardalen University Box 883, S-721 23 Västerås, Sweden jangustafsson@mdhse

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS

FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS Peter Puschner and Raimund Kirner Vienna University of Technology, A-1040 Vienna, Austria {peter, raimund}@vmars.tuwien.ac.at Abstract Keywords:

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

Transforming Execution-Time Boundable Code into Temporally Predictable Code

Transforming Execution-Time Boundable Code into Temporally Predictable Code Transforming Execution-Time Boundable Code into Temporally Predictable Code Peter Puschner Institut for Technische Informatik. Technische Universitdt Wien, Austria Abstract: Traditional Worst-Case Execution-Time

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

THE DELAY COMPOSITION THEOREM ON PIPELINE SYSTEMS WITH NON-PREEMPTIVE PRIORITY VARYING SCHEDULING ALGORITHMS YI LU THESIS

THE DELAY COMPOSITION THEOREM ON PIPELINE SYSTEMS WITH NON-PREEMPTIVE PRIORITY VARYING SCHEDULING ALGORITHMS YI LU THESIS THE DELAY COMPOSITION THEOREM ON PIPELINE SYSTEMS WITH NON-PREEMPTIVE PRIORITY VARYING SCHEDULING ALGORITHMS BY YI LU THESIS Submitted in partial fulfillment of the requirements for the degree of Master

More information

FSU DEPARTMENT OF COMPUTER SCIENCE

FSU DEPARTMENT OF COMPUTER SCIENCE mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida Predicting Instruction Cache Behavior Frank Mueller () David Whalley () Marion Harmon(FAMU) Tallahassee, FL

More information

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core

SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

The of these simple branch prediction strategies is about 3%, but some benchmark programs have a of. A more sophisticated implementation of static bra

The of these simple branch prediction strategies is about 3%, but some benchmark programs have a of. A more sophisticated implementation of static bra Improving Semi-static Branch Prediction by Code Replication Andreas Krall Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-4 Wien andimips.complang.tuwien.ac.at Abstract Speculative

More information

Performance Analysis of Embedded Software Using Implicit Path Enumeration

Performance Analysis of Embedded Software Using Implicit Path Enumeration Performance Analysis of Embedded Software Using Implicit Path Enumeration Yau-Tsun Steven Li Sharad Malik Department of Electrical Engineering, Princeton University, NJ 08544, USA. Abstract Embedded computer

More information

Real-Time Scheduling of Sensor-Based Control Systems

Real-Time Scheduling of Sensor-Based Control Systems In Proceedings of Eighth IEEE Workshop on Real-Time Operatings Systems and Software, in conjunction with 7th IFAC/IFIP Workshop on Real-Time Programming, Atlanta, GA, pp. 44-50, May 99. Real-Time Scheduling

More information

Timing Anomalies and WCET Analysis. Ashrit Triambak

Timing Anomalies and WCET Analysis. Ashrit Triambak Timing Anomalies and WCET Analysis Ashrit Triambak October 20, 2014 Contents 1 Abstract 2 2 Introduction 2 3 Timing Anomalies 3 3.1 Retated work........................... 4 3.2 Limitations of Previous

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

History-based Schemes and Implicit Path Enumeration

History-based Schemes and Implicit Path Enumeration History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 41 of 49 Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis

More information

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges ESE532: System-on-a-Chip Architecture Day 9: October 1, 2018 Real Time Real Time Demands Challenges Today Algorithms Architecture Disciplines to achieve Penn ESE532 Fall 2018 -- DeHon 1 Penn ESE532 Fall

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Introduction to Microcontrollers

Introduction to Microcontrollers Introduction to Microcontrollers Embedded Controller Simply an embedded controller is a controller that is embedded in a greater system. One can define an embedded controller as a controller (or computer)

More information

Real-Time Systems, 13(1):47-65, July 1997.

Real-Time Systems, 13(1):47-65, July 1997. Real-Time Systems, 13(1):47-65, July 1997.,, 1{20 () c Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Threaded Prefetching: A New Instruction Memory Hierarchy for Real-Time Systems*

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

Automatic flow analysis using symbolic execution and path enumeration

Automatic flow analysis using symbolic execution and path enumeration Automatic flow analysis using symbolic execution path enumeration D. Kebbal Institut de Recherche en Informatique de Toulouse 8 route de Narbonne - F-62 Toulouse Cedex 9 France Djemai.Kebbal@iut-tarbes.fr

More information

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations

More information

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988 SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS BY IAN RAMSAY PHILP B.S., University of North Carolina at Chapel Hill, 1988 M.S., University of Florida, 1990 THESIS Submitted in partial fulllment

More information

Utilizing Concurrency: A New Theory for Memory Wall

Utilizing Concurrency: A New Theory for Memory Wall Utilizing Concurrency: A New Theory for Memory Wall Xian-He Sun (&) and Yu-Hang Liu Illinois Institute of Technology, Chicago, USA {sun,yuhang.liu}@iit.edu Abstract. In addition to locality, data access

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

Pull based Migration of Real-Time Tasks in Multi-Core Processors

Pull based Migration of Real-Time Tasks in Multi-Core Processors Pull based Migration of Real-Time Tasks in Multi-Core Processors 1. Problem Description The complexity of uniprocessor design attempting to extract instruction level parallelism has motivated the computer

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2];

r[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2]; Using a Swap Instruction to Coalesce Loads and Stores Apan Qasem, David Whalley, Xin Yuan, and Robert van Engelen Department of Computer Science, Florida State University Tallahassee, FL 32306-4530, U.S.A.

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Supporting the Specification and Analysis of Timing Constraints *

Supporting the Specification and Analysis of Timing Constraints * Supporting the Specification and Analysis of Timing Constraints * Lo Ko, Christopher Healy, Emily Ratliff Marion Harmon Robert Arnold, and David Whalley Computer and Information Systems Dept. Computer

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Static WCET Analysis: Methods and Tools

Static WCET Analysis: Methods and Tools Static WCET Analysis: Methods and Tools Timo Lilja April 28, 2011 Timo Lilja () Static WCET Analysis: Methods and Tools April 28, 2011 1 / 23 1 Methods 2 Tools 3 Summary 4 References Timo Lilja () Static

More information

Compositional Schedulability Analysis of Hierarchical Real-Time Systems

Compositional Schedulability Analysis of Hierarchical Real-Time Systems Compositional Schedulability Analysis of Hierarchical Real-Time Systems Arvind Easwaran, Insup Lee, Insik Shin, and Oleg Sokolsky Department of Computer and Information Science University of Pennsylvania,

More information

Cycle accurate transaction-driven simulation with multiple processor simulators

Cycle accurate transaction-driven simulation with multiple processor simulators Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea

More information

Architectural Issues for the 1990s. David A. Patterson. Computer Science Division EECS Department University of California Berkeley, CA 94720

Architectural Issues for the 1990s. David A. Patterson. Computer Science Division EECS Department University of California Berkeley, CA 94720 Microprocessor Forum 10/90 1 Architectural Issues for the 1990s David A. Patterson Computer Science Division EECS Department University of California Berkeley, CA 94720 1990 (presented at Microprocessor

More information

Fig. 1. AMBA AHB main components: Masters, slaves, arbiter and decoder. (Picture from AMBA Specification Rev 2.0)

Fig. 1. AMBA AHB main components: Masters, slaves, arbiter and decoder. (Picture from AMBA Specification Rev 2.0) AHRB: A High-Performance Time-Composable AMBA AHB Bus Javier Jalle,, Jaume Abella, Eduardo Quiñones, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona Supercomputing Center, Spain Universitat

More information

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

Coarse-to-Fine Search Technique to Detect Circles in Images

Coarse-to-Fine Search Technique to Detect Circles in Images Int J Adv Manuf Technol (1999) 15:96 102 1999 Springer-Verlag London Limited Coarse-to-Fine Search Technique to Detect Circles in Images M. Atiquzzaman Department of Electrical and Computer Engineering,

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)

More information

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

More information

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh

Embedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow

More information

ECE 3055: Final Exam

ECE 3055: Final Exam ECE 3055: Final Exam Instructions: You have 2 hours and 50 minutes to complete this quiz. The quiz is closed book and closed notes, except for one 8.5 x 11 sheet. No calculators are allowed. Multiple Choice

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

Demand fetching is commonly employed to bring the data

Demand fetching is commonly employed to bring the data Proceedings of 2nd Annual Conference on Theoretical and Applied Computer Science, November 2010, Stillwater, OK 14 Markov Prediction Scheme for Cache Prefetching Pranav Pathak, Mehedi Sarwar, Sohum Sohoni

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,

More information

On the False Path Problem in Hard Real-Time Programs

On the False Path Problem in Hard Real-Time Programs On the False Path Problem in Hard Real-Time Programs Peter Altenbernd C-LAB D-33095 Paderborn, Germany peter@cadlab.de, http://www.cadlab.de/peter/ Abstract This paper addresses the important subject of

More information

Worst-Case Execution Times Analysis of MPEG-2 Decoding

Worst-Case Execution Times Analysis of MPEG-2 Decoding Worst-Case Execution Times Analysis of MPEG-2 Decoding Peter Altenbernd Lars-Olof Burchard Friedhelm Stappert C-LAB PC 2 C-LAB 33095 Paderborn, GERMANY peter@c-lab.de baron@upb.de fst@c-lab.de Abstract

More information

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas The CPU Design Kit: An Instructional Prototyping Platform for Teaching Processor Design Anujan Varma, Lampros Kalampoukas Dimitrios Stiliadis, and Quinn Jacobson Computer Engineering Department University

More information

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin Product Bulletin TM DSP/BIOS Kernel Scalable, Real-Time Kernel TM for TMS320 DSPs Key Features: Fast, deterministic real-time kernel Scalable to very small footprint Tight integration with Code Composer

More information

Dissecting Execution Traces to Understand Long Timing Effects

Dissecting Execution Traces to Understand Long Timing Effects Dissecting Execution Traces to Understand Long Timing Effects Christine Rochange and Pascal Sainrat February 2005 Rapport IRIT-2005-6-R Contents 1. Introduction... 5 2. Long timing effects... 5 3. Methodology...

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Maintaining Temporal Consistency: Issues and Algorithms

Maintaining Temporal Consistency: Issues and Algorithms Maintaining Temporal Consistency: Issues and Algorithms Ming Xiong, John A. Stankovic, Krithi Ramamritham, Don Towsley, Rajendran Sivasankaran Department of Computer Science University of Massachusetts

More information

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor Evaluation of Power Consumption of Modified Bubble, Quick and, Algorithm on the Dual Processor Ahmed M. Aliyu *1 Dr. P. B. Zirra *2 1 Post Graduate Student *1,2, Computer Science Department, Adamawa State

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

Solutions to exercises on Memory Hierarchy

Solutions to exercises on Memory Hierarchy Solutions to exercises on Memory Hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University

More information

Shared Cache Aware Task Mapping for WCRT Minimization

Shared Cache Aware Task Mapping for WCRT Minimization Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Real Time Spectrogram

Real Time Spectrogram Real Time Spectrogram EDA385 Final Report Erik Karlsson, dt08ek2@student.lth.se David Winér, ael09dwi@student.lu.se Mattias Olsson, ael09mol@student.lu.se October 31, 2013 Abstract Our project is about

More information

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking

A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology

More information

Real-Time Programming with GNAT: Specialised Kernels versus POSIX Threads

Real-Time Programming with GNAT: Specialised Kernels versus POSIX Threads Real-Time Programming with GNAT: Specialised Kernels versus POSIX Threads Juan A. de la Puente 1, José F. Ruiz 1, and Jesús M. González-Barahona 2, 1 Universidad Politécnica de Madrid 2 Universidad Carlos

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information