Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations
|
|
- Roland Warner
- 5 years ago
- Views:
Transcription
1 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June Predicting the Worst-Case Execution Time of the Concurrent Execution of Instructions and Cycle-Stealing DMA I/O Operations Tai-Yi Huang and Jane W.-S. Liu Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL fthuang, May 3, 1995 Abstract This paper describes an ecient algorithm which gives a bound on the worst-case execution times of the concurrent execution of CPU instructions and cycle-stealing DMA I/O operations. Simulations of several programs were conducted to evaluate this algorithm. Compared with the traditional pessimistic approach, the bound on the worst-case execution time produced by the algorithm is signicantly tighter. For a sample program that multiplies two matrices while the I/O bus is fully utilized, our algorithm achieves a 39% improvement in the accuracy of the prediction. 1 Introduction Algorithms for scheduling tasks in hard-real-time systems typically assume that their worst-case execution times are known. Such a system is deigned to ensure that all tasks can complete by their deadlines as long as no task in a system executes longer than its worst-case execution time (WCET). A task which overruns may lead to missed deadlines and the failure of the whole system. For this reason, how to bound the WCET of programs has received a great deal of attention in recent years. Mok et al. [3] developed a graphical tool to analyze the timing behavior of assembly language programs and to bound their WCET. This tool requires that the maximum iteration number of each loop structure be known. Park and Shaw [6, 7] developed a similar method for source-level programs. Their dynamic path analysis method eliminates infeasible execution paths and thus tightens the prediction of the WCET. Pusher and Koza [8] introduced several new language constructs with which programmers can describe the timing behavior of their programs. Their experiment showed that with this valuable information, the gap between the calculated WCET and the real WCET can be reduced signicantly. To predict the WCET of concurrent programs, Niehaus [5] developed a semanticspreserving transformation for concurrent programming language constructs such as critical sections and synchronous communication. Zhang, Burns and Nicholson [11] developed a mathematical model to predict the WCET of programs executed on a two-stage pipeline. Mueller, Whalley and Harmon [4] developed a static cache simulation method to predict instruction cache behavior and bound its worst-case performance. This paper rst analyzes the delay caused by cyclestealing direct-memory access (DMA) I/O activities. It then presents an algorithm to estimate the WCET of the concurrent execution of a stream of CPU instructions and DMA activities. A DMA controller transfers data between the main memory and I/O devices with minimal CPU involvement. As a result, the CPU can execute other instructions while a DMA controller is transferring data. A DMA controller operates either in burst mode or in cycle-stealing mode. A DMA controller in cycle-stealing mode transfers data by \stealing" bus cycles from an executing program. In this way, it retards the progress of the executing program and extends the execution time of the program. A conservative estimate of the WCET of a stream of CPU instructions and a cycle-stealing DMA I/O operation, which are ready at the same time, is the sum of their WCET obtained by assuming that each executes alone. Obviously this estimate is pessimistic. We present here an analysis method and an algorithm which give a tighter bound on the WCET. The performance of the algorithm in terms of the amount of reduction from the most pessimistic WCET estimate is demonstrated by simulation results. The rest of the paper is structured as follow. Section 2 describes the machine model that is the basis of our analysis method. Section 3 presents the method. Section 4 presents an algorithm to implement the method.
2 processor clock I/O Bus BUSY BUSY IDLE BUSY CPU memory read (fetch instruction) memory read (fetch operand) internal operation (execution) memory write (write data) An instruction cycle Figure 1: The instruction cycle of ADD 1, (A0) Our simulation results are presented in Section 5. Finally, Section 6 concludes the paper and discusses future work. 2 The Machine Model We adopt a commonly used machine model according to which an instruction executes in the manner as shown in Figure 1. The sequence of fetch and execution of an instruction is called an instruction cycle. Each instruction cycle is composed of one or more machine cycles. A machine cycle requires one to several processor clock cycles to execute. Dierent machine cycles execute different functions. For example, the instruction cycle of ADD 1,(A0), shown in Figure 1, is composed of four machine cycles: a memory read (bus-access) cycle to fetch the instruction, a memory read (bus-access) cycle to fetch the operand, an execution (no-bus-access) cycle to carry out the addition, followed by a memory write (bus-access) cycle to write back the data. Each machine cycle in this example in turn takes 4, 3, 3, and 4 processor clock cycles to execute. Since we focus on the analysis of how the DMA controller and the CPU contend for the bus, we are concerned primarily with whether the CPU accesses the I/O bus during each machine cycle. Therefore, we classify all machine cycles into two categories: bus-access (B) cycles and execution (E) cycles. B cycles are those machine cycles during which the CPU uses the I/O bus. In contrast, during E cycles, the CPU does not need the bus. In general, there may be several consecutive E cycles in an instruction cycle. We assume that the CPU is a synchronous one: the beginning of each machine cycle is triggered by the processor clock. Our analysis method is applicable only for systems that have no cache memory and no pipeline operation. The DMA controller and the CPU share the same I/O bus, as shown in Figure 2. At any time, either the DMA controller or the CPU, but not both, can hold the bus (i.e., be the master of the bus) and transfer data. We focus on the case where the DMA controller operates in cycle-stealing mode. In this mode, it is allowed to access the bus only when the CPU is in an E cycle. The protocol we use to regulate the bus contention between the DMA controller and the CPU is based on the VMEbus specication [9]. Because this protocol is suf- ciently general, the analysis method presented in Section 3 for bounding the delay caused by cycle-stealing DMA I/O activities is applicable to many other commonly used buses. CPU I/O device DMA controller I/O Bus I/O device Memory Figure 2: The architecture of the machine model To become the bus master, the DMA controller rst sends a bus request. If the CPU is in a B cycle, the DMA controller waits. The CPU releases the bus when it enters an E cycle. After a short delay, while the ownership of bus is transferred from the CPU to the DMA controller, the DMA controller gains the bus and starts its data transfer. We will refer to this delay period as Bus Master transfer Time (BMT). After the DMA controller completes the transfer of each unit of data, the bus controller checks if there is any pending bus request from the CPU. The DMA controller is allowed to continue its next transfer if there is no request from the CPU. Otherwise, the DMA controller releases the bus. The CPU gains the bus after a BMT delay.
3 3 Timing Analysis Generally the DMA controller behaves in the following manner. After sending a bus request, the DMA controller waits when the CPU enters a B cycle from a B cycle, becomes the bus master when the CPU enters an E cycle from a B cycle, continues its transfer as long as the CPU continues to be in E cycles, and releases the bus when it nishes all the data transfers or when the CPU enters a B cycle from an E cycle. Again, whether there is any pending bus request is checked only at the end of each data transfer. The CPU does not gain the bus immediately after it sends a bus request if the DMA controller is currently transferring data. Therefore, the executing program may suer delay, and its completion time is postponed accordingly. Figure 3 illustrates the concurrent execution of the DMA controller and the sequence of machine cycles B n! E 1! E 2! : : :! E k! B n+1. Our analysis method assumes that the number of consecutive E cycles in each instruction is known. As shown in Figure 3, the DMA controller gains the bus when the CPU enters E 1 cycle from B n cycle. It keeps transferring data during the interval from E 1 cycle to E k cycle. The DMA controller releases the bus when the CPU is entering B n+1 cycle. The execution of B n+1 cycle is delayed for (b + BMT), where b is the delay between when the CPU requests the bus and when the request is checked and the DMA controller releases the bus. Again, BMT is the delay between the time when a bus master releases the bus to the time when the next bus master gains the bus. Let m denote the number of data transfers performed by the DMA controller during the k consecutive E cycles, and DT be the amount of time to do each transfer. To calculate m, we let T Ei denote the execution time of the machine cycle E i, and T k = kx i=1 T Ei be the total execution time of the k consecutive E cycles. Based on the facts that 0 b < DT and T k + b = m DT + BMT, we have (m? 1) T k? BMT < DT Therefore we conclude that m = Tk? BMT DT m (1) Bn BMT E 1 E m * DT E k m b d BMT CPU sends a bus request B n+1 Figure 3: The concurrent execution of cycle-stealing DMA I/O and a sequence of E cycles The delay suered by the CPU execution of the sequence of machine cycles is d 0 = m DT + 2 BMT? T k if the DMA controller holds the bus for m data transfers. On the other hand, if less than m data transfers are performed, the delay becomes shorter because the bus is transferred without the delay b. Because of the assumption that each machine cycle is triggered by the processor clock, the machine cycle B n+1 cannot start until the next clock cycle. As a result, the exact delay suered by the CPU execution is at most equal to d d = 0 where T c is the period of a clock cycle. T c T c (2) 4 Bounding the WCET For a given a stream of N CPU instructions, together with a DMA I/O operation that requires M data transfers and is ready at the same time as the stream of CPU instructions, we want to nd the WCET of the concurrent execution, that is, the maximum amount of time required for both the instruction stream and the I/O operation to complete. We now present an algorithm which makes use of the knowledge about how the CPU instructions and the DMA operation interfere one another. By doing so, it gives us a tighter bound on the WCET. Because each instruction begins with a B cycle, no DMA data transfer can take place across two instructions. Consequently, the eects of cycle-stealing on each instruction can be analyzed independently, without considering the other instructions. The algorithm shown
4 in Figure 4 uses Eqs. (1) and (2) to calculate in each instruction what the worst-case delay caused by cyclestealing is and how many data transfers the DMA controller can perform. The information needed by this algorithm includes how many machine cycles each instruction is composed of, the function of each machine cycle, and the execution time of each machine cycle. This information can be obtained from the reference manual provided by the manufacturer of the processor. The algorithm also requires as inputs, two parameters of the bus, BMT and DT. The rest of the algorithm is self-explanatory. Because the delay of each instruction obtained here is the worst-case delay, the value returned by the algorithm is an upper bound of the execution time. On the other hand, because the algorithm accounts for the effect of the concurrent execution of CPU instructions and DMA I/O, the WCET we get should be much tighter than the pessimistic estimate. 5 Simulation Results We now demonstrate the performance of our algorithm by several simulation results. Given a stream of CPU instructions and a DMA I/O operation, we rst use the pessimistic approach to predict its WCET. According to this approach, the predicted value WCET pessi is equal to the sum of the WCET of the instruction stream when it executes alone and the WCET of the DMA I/O operation when the operation is done alone. We next make the prediction by our algorithm. The value returned is denoted by WCET ours. We use the percentage reduction from the pessimistic WCET R = WCET pessi? WCET ours WCET pessi 100% to measure the performance of the algorithm. Table 1 lists the C programs tested in our simulation. Because of the wide use of CPU32 in embedded systems, we compile these programs into assembly language programs of MC68332, one of the MC68300 Family of embedded controllers. We execute these programs in a simulator to obtain their execution traces. The timing information of each instruction in the traces is given by [1]. Since the clock frequency of MC68332 microprocessor is MHz, the period of a clock cycle T c is 60 ns. Assuming that a 0-wait memory is used and the size of data in each DMA transfer is equal to the bus bandwidth, we set DT to 120 ns. At last, BMT is 10 ns. To investigate the relationship between the performance of our algorithm and the fraction of a trace which is overlapped with a DMA I/O operation, four simulations were conducted on each execution trace. For each trace, we generated four DMA I/O operations which Name qsort bubble fft spline gaussian mtxmul correlate mtxmul2 Description a quicksort of 250 elements. a bubble sort of 100 elements. a 128-node Fast Fourier Transform. a cubic spline function of 100 points. a 10x10 Gaussian elimination. a multiplication of 2 10x10 matrices. a correlate function of 500 tracks. a loop-unrolled version of mtxmul. Table 1: The test set of C programs carry out dierent number of data transfers. In particular, we chose the lengths of these DMA I/O operations so that the trace overlapped with each of the DMA I/O operations for either 25%, 50%, 75%, or 100% of the instructions. The case with 25% overlap means that the rst quarter of the trace executed concurrently with the DMA I/O operation while the rest executed alone. Similarly, in the case of 50% (75%) overlap, the last 50% (25%) of the trace executed alone. We then computed the WCET of the concurrent execution of the trace and each of the four DMA I/O operations. Thus, four values of R were obtained. Table 2 gives the results. Column 2 lists the number of instructions in each program trace. Column 4, 5, 6 and 7 list the percentage reduction in the predicted WCET when the rst quarter, the rst half, the rst three quarters, and the whole trace executed concurrently with a DAM I/O operation, respectively. These values of R indicate that compared with the pessimistic approach, our algorithm produces a more accurate prediction of the WCET of a program when the program executes concurrently with a DMA I/O operation for a larger percentage of the time. This conclusion is an expected one since WCET pessi is more pessimistic when the percentage of overlap is larger. We also investigated the relationship between the performance of our algorithm and the computational requirements of programs. We classify all instructions here into two categories: long instructions and short instructions. An instruction is a long one if during its execution, the CPU does not need the bus for 8 processor clock cycles or more. In contrast, during the execution of a short instruction, the CPU never allows any I/O device to have the bus for such a long period. Generally speaking, long instructions require intensive computation, and short instructions are those that do data movement or simple computation. For example, the instructions MULU.W D1,D2 and DIVU.W D2,D0 are long instructions, and MOVE.L (A3)+,D0 and ADD.L D0,D1 are short instructions. Because the delay caused by cycle-stealing on each instruction is bounded by Eq. (2),
5 Input: { the number of CPU instructions, N, and the instructions inst[1], inst[2],: : :, inst[n] in the stream S. { the number of data transfers, M, required in the cycle-stealing DMA I/O operation. { the execution time of each instruction, inst[i].execution time, and its machine cycles, for i = 1; 2; : : : ; N. { BMT and DT, the two parameters of the I/O bus. Output: WCET, the worst-case execution time of the concurrent execution of the instruction stream S and the DMA I/O operation. Procedure: 1. Set WCET and trans, the number of transfers completed, to zero. 2. For i = 1 to N, compute the contribution of inst[i] to WCET and increment WCET by the amount as follow: a. update WCET = WCET + inst[i].execution time. b. if (trans < M) { compute the worst-case delay d suered by inst[i] according to Eq. (2), and update WCET = WCET + d. { compute the number of transfers, m, completed in inst[i] according to Eq. (1), and update trans = trans + m. 3. If the I/O operation is not completed yet, increment WCET by the amount of time (M? trans) DT to complete the last M? trans data transfer alone. Figure 4: An algorithm which gives a tighter bound on the WCET Name Instructions Long R in % executed instructions qsort 23,026 0% bubble 65,726 0% fft 249,107 2% spline 209,837 3% gaussian 47,272 5% mtxmul 36,789 11% correlate 26,543 17% mtxmul2 9,391 22% Table 2: The simulation results
6 the overhead of each DMA transfer in a long instruction is less than that in a short instruction. Consequently, when a trace contains a higher percentage of long instructions, the algorithm produces a larger reduction percentage. We tested programs with dierent computational requirements in the simulation. Column 3 of Table 2 gives the percentage of long instructions in each program trace. We note the value of R increases monotonically with the percentage of long instructions. Among the tested programs, mtxmul2 is obtained by unrolling the whole innermost loop of mtxmul. The loop unrolling procedure signicantly increases the percentage of long instructions in the trace. As a result, our algorithm performs better on the loop-unrolled version: a 39% reduction from the most pessimistic WCET estimate is achieved. 6 Conclusion and Future Work Cycle-stealing DMA operations have often been disabled in real-time systems because of the uncertainty in the amount of time such an operation may delay the completion of an executing program. We presented here an analysis method to determine this delay. Based on the method, we developed an algorithm which gives a tighter bound on the WCET of the concurrent execution of a stream of CPU instructions and a cycle-stealing DMA I/O operation. Simulation results demonstrate that the algorithm can produce more accurate predictions of WCET than the pessimistic WCET estimates, especially when the program contains a large percentage of computation intensive instructions. Our analysis method is applicable only when there is no cache memory. If cache memory is used, the number of bus accesses by the CPU is signicantly reduced because of cache hits. We expect a greater improvement gained by our more accurately accounting for the delay caused by the concurrent execution of cycle-stealing operations. In the future we will extend our analysis to systems in which on-chip cache memory is present. Our work encourages the inclusion of I/O instructions in real-time programs. Because of the hardwaredependent features of I/O instructions, determining their WCET becomes extremely dicult. Traditionally, I/O instructions are not allowed or restricted to appear at the predened areas such as the beginning and end of a program [2] [10]. By decomposing timing related information in a table-driven manner, our work can be used to predict the WCET of a program containing DMA I/O instructions. The future work will build a tool capable of predicting the WCET of programs containing any I/O instruction in a table-driven manner. References [1] MC68000 Family: CPU32 reference manual. Motorola, [2] Mark H. Klein and Thomas Ralya. An analysis of input/output paradigms for real-time systems. Technical Report CMU/SEI-90-TR-19, CMU Software Engineering Institute, July [3] Aloysius K. Mok, Prasanna Amerasinghe, Moyer Chen, and Kamtorn Tantisirivat. Evaluating tight execution time bounds of programs by annotations. In Proceedings of the Sixth IEEE Workshop on Real-Time Operating Systems and Software, pages 272{279, May [4] Frank Mueller, David Whalley, and Marion Harmon. Predicting instruction cache behavior. In ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, June [5] Douglas Niehaus. Program representation and translation for predictable real-time systems. In Proceedings of Real-Time Systems Symposium, pages 53{63, [6] Chang Yun Park. Predicting program execution times by analyzing static and dynamic program paths. Journal of Real-Time Systems, 5:31{62, March [7] Chang Yun Park and Alan C. Shaw. Experiments with a program timing tool based on source-level timing schema. IEEE Computer, pages 48{57, May [8] P. Puschner and C. Koza. Calculating the maximum execution time of real{time programs. Journal of Real-Time Systems, 1:159{176, September [9] The VMEbus Specication. Motorola, [10] A. Vrchoticky and P. Puschner. On the feasibiity of response time predictions{an experimental evaluation. Technical Report 2/91, Institute fur Technische Informatik Technische Universitat Wien, March [11] N. Zhang, A. Burns, and M. Nicholson. Pipelined processors and worst case execution times. Journal of Real-Time Systems, 5:319{343, October 1993.
Allowing Cycle-Stealing Direct Memory Access I/O. Concurrent with Hard-Real-Time Programs
To appear in: Int. Conf. on Parallel and Distributed Systems, ICPADS'96, June 3-6, 1996, Tokyo Allowing Cycle-Stealing Direct Memory Access I/O Concurrent with Hard-Real-Time Programs Tai-Yi Huang, Jane
More informationWorst-Case Timing Analysis of Cycle-Stealing DMA I/O Tasks. Tai-Yi Huang and Jane W.-S. Liu W. Springeld Ave.
Worst-Case Timing Analysis of Cycle-Stealing DMA I/O Tasks Tai-Yi Huang and Jane W.-S. Liu Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Springeld Ave. Urbana, IL 61801
More informationA MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS
A MACHINE INDEPENDENT WCET PREDICTOR FOR MICROCONTROLLERS AND DSPS Adriano José Tavares Department of Industrial Electronics, University of Minho, 4800 Guimarães, Portugal e-mail: atavares@dei.uminho.pt,
More informationA Tool for the Computation of Worst Case Task Execution Times
A Tool for the Computation of Worst Case Task Execution Times P. Puschner and A. Schedl Institut fur Technische Informatik Technical University of Vienna Vienna, A-1040 Austria Abstract In modern programming
More informationsignature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1
CONTROL FLOW MONITORING FOR A TIME-TRIGGERED COMMUNICATION CONTROLLER Thomas M. Galla 1, Michael Sprachmann 2, Andreas Steininger 1 and Christopher Temple 1 Abstract A novel control ow monitoring scheme
More informationEliminating Annotations by Automatic Flow Analysis of Real-Time Programs
Eliminating Annotations by Automatic Flow Analysis of Real-Time Programs Jan Gustafsson Department of Computer Engineering, Mälardalen University Box 883, S-721 23 Västerås, Sweden jangustafsson@mdhse
More informationReal-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo
Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract
More informationAn Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling
An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate
More informationait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS
ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com
More informationFROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS
FROM TIME-TRIGGERED TO TIME-DETERMINISTIC REAL-TIME SYSTEMS Peter Puschner and Raimund Kirner Vienna University of Technology, A-1040 Vienna, Austria {peter, raimund}@vmars.tuwien.ac.at Abstract Keywords:
More informationFPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST
FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is
More informationTransforming Execution-Time Boundable Code into Temporally Predictable Code
Transforming Execution-Time Boundable Code into Temporally Predictable Code Peter Puschner Institut for Technische Informatik. Technische Universitdt Wien, Austria Abstract: Traditional Worst-Case Execution-Time
More informationSingle-Path Programming on a Chip-Multiprocessor System
Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationTHE DELAY COMPOSITION THEOREM ON PIPELINE SYSTEMS WITH NON-PREEMPTIVE PRIORITY VARYING SCHEDULING ALGORITHMS YI LU THESIS
THE DELAY COMPOSITION THEOREM ON PIPELINE SYSTEMS WITH NON-PREEMPTIVE PRIORITY VARYING SCHEDULING ALGORITHMS BY YI LU THESIS Submitted in partial fulfillment of the requirements for the degree of Master
More informationFSU DEPARTMENT OF COMPUTER SCIENCE
mueller@cs.fsu.edu whalley@cs.fsu.edu of Computer Science Department State University Florida Predicting Instruction Cache Behavior Frank Mueller () David Whalley () Marion Harmon(FAMU) Tallahassee, FL
More informationDynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ
Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea
More informationArchitectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad
nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses
More informationinstruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals
Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,
More informationSIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core
SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science SIC: Provably Timing-Predictable
More information4. Hardware Platform: Real-Time Requirements
4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture
More informationThe of these simple branch prediction strategies is about 3%, but some benchmark programs have a of. A more sophisticated implementation of static bra
Improving Semi-static Branch Prediction by Code Replication Andreas Krall Institut fur Computersprachen Technische Universitat Wien Argentinierstrae 8 A-4 Wien andimips.complang.tuwien.ac.at Abstract Speculative
More informationPerformance Analysis of Embedded Software Using Implicit Path Enumeration
Performance Analysis of Embedded Software Using Implicit Path Enumeration Yau-Tsun Steven Li Sharad Malik Department of Electrical Engineering, Princeton University, NJ 08544, USA. Abstract Embedded computer
More informationReal-Time Scheduling of Sensor-Based Control Systems
In Proceedings of Eighth IEEE Workshop on Real-Time Operatings Systems and Software, in conjunction with 7th IFAC/IFIP Workshop on Real-Time Programming, Atlanta, GA, pp. 44-50, May 99. Real-Time Scheduling
More informationTiming Anomalies and WCET Analysis. Ashrit Triambak
Timing Anomalies and WCET Analysis Ashrit Triambak October 20, 2014 Contents 1 Abstract 2 2 Introduction 2 3 Timing Anomalies 3 3.1 Retated work........................... 4 3.2 Limitations of Previous
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationHistory-based Schemes and Implicit Path Enumeration
History-based Schemes and Implicit Path Enumeration Claire Burguière and Christine Rochange Institut de Recherche en Informatique de Toulouse Université Paul Sabatier 6 Toulouse cedex 9, France {burguier,rochange}@irit.fr
More informationExploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,
More informationFinal Lecture. A few minutes to wrap up and add some perspective
Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection
More informationModel-based Analysis of Event-driven Distributed Real-time Embedded Systems
Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,
More informationHPC VT Machine-dependent Optimization
HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler
More informationProbabilistic Worst-Case Response-Time Analysis for the Controller Area Network
Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering
More informationExtra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987
Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is
More informationImproving Real-Time Performance on Multicore Platforms Using MemGuard
Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study
More informationClassification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis
Proceedings of the 5th Intl Workshop on Worst-Case Execution Time (WCET) Analysis Page 41 of 49 Classification of Code Annotations and Discussion of Compiler-Support for Worst-Case Execution Time Analysis
More informationESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges
ESE532: System-on-a-Chip Architecture Day 9: October 1, 2018 Real Time Real Time Demands Challenges Today Algorithms Architecture Disciplines to achieve Penn ESE532 Fall 2018 -- DeHon 1 Penn ESE532 Fall
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationIntroduction to Microcontrollers
Introduction to Microcontrollers Embedded Controller Simply an embedded controller is a controller that is embedded in a greater system. One can define an embedded controller as a controller (or computer)
More informationReal-Time Systems, 13(1):47-65, July 1997.
Real-Time Systems, 13(1):47-65, July 1997.,, 1{20 () c Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Threaded Prefetching: A New Instruction Memory Hierarchy for Real-Time Systems*
More informationBus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao
Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor
More informationAutomatic flow analysis using symbolic execution and path enumeration
Automatic flow analysis using symbolic execution path enumeration D. Kebbal Institut de Recherche en Informatique de Toulouse 8 route de Narbonne - F-62 Toulouse Cedex 9 France Djemai.Kebbal@iut-tarbes.fr
More informationStorage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk
HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations
More informationSCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988
SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS BY IAN RAMSAY PHILP B.S., University of North Carolina at Chapel Hill, 1988 M.S., University of Florida, 1990 THESIS Submitted in partial fulllment
More informationUtilizing Concurrency: A New Theory for Memory Wall
Utilizing Concurrency: A New Theory for Memory Wall Xian-He Sun (&) and Yu-Hang Liu Illinois Institute of Technology, Chicago, USA {sun,yuhang.liu}@iit.edu Abstract. In addition to locality, data access
More informationCOSC 6385 Computer Architecture - Memory Hierarchy Design (III)
COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationMARIE: An Introduction to a Simple Computer
MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.
More informationPull based Migration of Real-Time Tasks in Multi-Core Processors
Pull based Migration of Real-Time Tasks in Multi-Core Processors 1. Problem Description The complexity of uniprocessor design attempting to extract instruction level parallelism has motivated the computer
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationIntegrating MRPSOC with multigrain parallelism for improvement of performance
Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,
More informationConsistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:
Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationr[2] = M[x]; M[x] = r[2]; r[2] = M[x]; M[x] = r[2];
Using a Swap Instruction to Coalesce Loads and Stores Apan Qasem, David Whalley, Xin Yuan, and Robert van Engelen Department of Computer Science, Florida State University Tallahassee, FL 32306-4530, U.S.A.
More informationCOSC 6385 Computer Architecture. - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available
More informationSupporting the Specification and Analysis of Timing Constraints *
Supporting the Specification and Analysis of Timing Constraints * Lo Ko, Christopher Healy, Emily Ratliff Marion Harmon Robert Arnold, and David Whalley Computer and Information Systems Dept. Computer
More informationHardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT
Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore
More informationHardware-Software Codesign. 9. Worst Case Execution Time Analysis
Hardware-Software Codesign 9. Worst Case Execution Time Analysis Lothar Thiele 9-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis
More informationStatic WCET Analysis: Methods and Tools
Static WCET Analysis: Methods and Tools Timo Lilja April 28, 2011 Timo Lilja () Static WCET Analysis: Methods and Tools April 28, 2011 1 / 23 1 Methods 2 Tools 3 Summary 4 References Timo Lilja () Static
More informationCompositional Schedulability Analysis of Hierarchical Real-Time Systems
Compositional Schedulability Analysis of Hierarchical Real-Time Systems Arvind Easwaran, Insup Lee, Insik Shin, and Oleg Sokolsky Department of Computer and Information Science University of Pennsylvania,
More informationCycle accurate transaction-driven simulation with multiple processor simulators
Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea
More informationArchitectural Issues for the 1990s. David A. Patterson. Computer Science Division EECS Department University of California Berkeley, CA 94720
Microprocessor Forum 10/90 1 Architectural Issues for the 1990s David A. Patterson Computer Science Division EECS Department University of California Berkeley, CA 94720 1990 (presented at Microprocessor
More informationFig. 1. AMBA AHB main components: Masters, slaves, arbiter and decoder. (Picture from AMBA Specification Rev 2.0)
AHRB: A High-Performance Time-Composable AMBA AHB Bus Javier Jalle,, Jaume Abella, Eduardo Quiñones, Luca Fossati, Marco Zulianello, Francisco J. Cazorla, Barcelona Supercomputing Center, Spain Universitat
More informationWorst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni
orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU
More informationCoarse-to-Fine Search Technique to Detect Circles in Images
Int J Adv Manuf Technol (1999) 15:96 102 1999 Springer-Verlag London Limited Coarse-to-Fine Search Technique to Detect Circles in Images M. Atiquzzaman Department of Electrical and Computer Engineering,
More informationThreshold-Based Markov Prefetchers
Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this
More informationCaches in Real-Time Systems. Instruction Cache vs. Data Cache
Caches in Real-Time Systems [Xavier Vera, Bjorn Lisper, Jingling Xue, Data Caches in Multitasking Hard Real- Time Systems, RTSS 2003.] Schedulability Analysis WCET Simple Platforms WCMP (memory performance)
More informationMemory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas
Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid
More informationEmbedded Systems Lecture 11: Worst-Case Execution Time. Björn Franke University of Edinburgh
Embedded Systems Lecture 11: Worst-Case Execution Time Björn Franke University of Edinburgh Overview Motivation Worst-Case Execution Time Analysis Types of Execution Times Measuring vs. Analysing Flow
More informationECE 3055: Final Exam
ECE 3055: Final Exam Instructions: You have 2 hours and 50 minutes to complete this quiz. The quiz is closed book and closed notes, except for one 8.5 x 11 sheet. No calculators are allowed. Multiple Choice
More informationSoftware Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,
More informationDemand fetching is commonly employed to bring the data
Proceedings of 2nd Annual Conference on Theoretical and Applied Computer Science, November 2010, Stillwater, OK 14 Markov Prediction Scheme for Cache Prefetching Pranav Pathak, Mehedi Sarwar, Sohum Sohoni
More informationPIPELINE AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates
More informationUnderstanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures
Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation
More informationLow-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationOn the False Path Problem in Hard Real-Time Programs
On the False Path Problem in Hard Real-Time Programs Peter Altenbernd C-LAB D-33095 Paderborn, Germany peter@cadlab.de, http://www.cadlab.de/peter/ Abstract This paper addresses the important subject of
More informationWorst-Case Execution Times Analysis of MPEG-2 Decoding
Worst-Case Execution Times Analysis of MPEG-2 Decoding Peter Altenbernd Lars-Olof Burchard Friedhelm Stappert C-LAB PC 2 C-LAB 33095 Paderborn, GERMANY peter@c-lab.de baron@upb.de fst@c-lab.de Abstract
More informationThe CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas
The CPU Design Kit: An Instructional Prototyping Platform for Teaching Processor Design Anujan Varma, Lampros Kalampoukas Dimitrios Stiliadis, and Quinn Jacobson Computer Engineering Department University
More informationDSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin
Product Bulletin TM DSP/BIOS Kernel Scalable, Real-Time Kernel TM for TMS320 DSPs Key Features: Fast, deterministic real-time kernel Scalable to very small footprint Tight integration with Code Composer
More informationDissecting Execution Traces to Understand Long Timing Effects
Dissecting Execution Traces to Understand Long Timing Effects Christine Rochange and Pascal Sainrat February 2005 Rapport IRIT-2005-6-R Contents 1. Introduction... 5 2. Long timing effects... 5 3. Methodology...
More informationELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges
ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns
More informationCHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song
CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationMaintaining Temporal Consistency: Issues and Algorithms
Maintaining Temporal Consistency: Issues and Algorithms Ming Xiong, John A. Stankovic, Krithi Ramamritham, Don Towsley, Rajendran Sivasankaran Department of Computer Science University of Massachusetts
More informationEvaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor
Evaluation of Power Consumption of Modified Bubble, Quick and, Algorithm on the Dual Processor Ahmed M. Aliyu *1 Dr. P. B. Zirra *2 1 Post Graduate Student *1,2, Computer Science Department, Adamawa State
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationSolutions to exercises on Memory Hierarchy
Solutions to exercises on Memory Hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University
More informationShared Cache Aware Task Mapping for WCRT Minimization
Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,
More information1.3 Data processing; data storage; data movement; and control.
CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical
More informationReal Time Spectrogram
Real Time Spectrogram EDA385 Final Report Erik Karlsson, dt08ek2@student.lth.se David Winér, ael09dwi@student.lu.se Mattias Olsson, ael09mol@student.lu.se October 31, 2013 Abstract Our project is about
More informationA Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking
A Time-Predictable Instruction-Cache Architecture that Uses Prefetching and Cache Locking Bekim Cilku, Daniel Prokesch, Peter Puschner Institute of Computer Engineering Vienna University of Technology
More informationReal-Time Programming with GNAT: Specialised Kernels versus POSIX Threads
Real-Time Programming with GNAT: Specialised Kernels versus POSIX Threads Juan A. de la Puente 1, José F. Ruiz 1, and Jesús M. González-Barahona 2, 1 Universidad Politécnica de Madrid 2 Universidad Carlos
More informationHigh Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction
More informationAnalytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for
More information2 MARKS Q&A 1 KNREDDY UNIT-I
2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More information