Real-time Scheduling on Multithreaded Processors

Size: px
Start display at page:

Download "Real-time Scheduling on Multithreaded Processors"

Transcription

1 Real-time Scheduling on Multithreaded Processors J. Kreuzinger, A. Schulz, M. Pfeffer, Th. Ungerer U. Brinkschulte, C. Krakowski Institute for Computer Design, Institute for Process Control, and Fault Tolerance Automation and Robotics University of Karlsruhe University of Karlsruhe D Karlsruhe, Germany D Karlsruhe, Germany ira.uka.de Abstract This paper investigates real-time scheduling algorithms on upcoming multithreaded processors. As evaluation testbed we introduce a multithreaded processor kernel which is specifically designed as core processor of a microcontroller or system-on-a-chip. Handling of external realtime events is performed through multithreading. Real-time threads are used as interrupt service threads (ISTs) instead of interrupt service routines (ISRs). Our proposed microcontroller supports multiple ISTs with zero-cycle context switching overhead. We investigate the behavior of fixed priority preemptive, earliest deadline first, least laxity first and guaranteed percentage scheduling with respect to multithreaded processors. Our finding is that the strategies GP and LLF result in a good blending of instructions of dqferent threads thus enabling a multithreaded processor to utilize latencies best. Assuming a zero-cycle context switch LLF performs best, however implementation cost are prohibitive. 1 Introduction The target market of our project is the wide-spread market of embedded systems, in particular, embedded realtime systems. In this area microcontrollers are typically preferred over general-purpose processors because of their on-chip integration of RAM memory and peripheral controllers, resulting in smaller and cheaper hardware. The execution performance is not the main criterion for microcontrollers. Additionally, support for real-time event handling, rapid context switching ability, and small memory requirements are also essential. Rapid context switching is a basic feature of the multithreaded processor technique, which is investigated since a couple of years for its latency utilization ability. Recently several multithreaded processors were announced by industry. A multithreaded processor is able to pursue multiple threads of control in parallel within the processor pipeline. The functional units are multiplexed between the thread contexts. Most approaches store the thread contexts in different register sets on the processor chip. Latencies that arise by cache misses, long running operations or other pipeline hazards are masked by switching to another thread. Multithreaded processors are able to bridge these latencies efficiently if there are enough parallel executable threads as workload and if the time necessary for switching of threads is very small. In consequence, recent announcements of high-performance processors by industry concern a 4-threaded Alpha processor of DECKompaq [ 11 and Sun s MAJC-5200 processor which features two 4- threaded processors on a single die [2]. Both processors are designed as high-performance processors and will not be suitable for low-cost embedded systems. Our Komodo project [3] explores the suitability of multithreading techniques in embedded real-time systems. We propose multithreading as an event handling mechanism that allows efficient handling of simultaneous overlapping events with hard real-time requirements. We design a microcontroller with a multithreaded processor core that allows to trigger so-called Interrupt-Service-Threads (ISTs) instead of Interrupt-Service-Routines (ISRs) for event handling [4]. Our Komodo microcontroller features a zerocycle context switch overhead and hardware support for priority schemes. Because of its application for embedded systems, the processor core of the Komodo micorcontroler is kept at the hardware level of a simple microcontroller similar to the M Our target architecture is a simple pipelined processor kernel which is able to issue one instruction per cycle. Recently, multithreading has also been proposed for event-handling of internal events ([5], [6], [7]) in future high-end processors applying one or more threads for exception handling executing these threads simultaneously to the main thread that caused the exception. However, /00 $ IEEE 155

2 the fast context switching ability of multithreading has rarely been explored in context of microcontrollers for handling of extemal hardware events. Besides our own approach, the EVENTS mechanism [8] proposes a FPGAbased processor-external hardware scheduler that triggers context switches in a single or in multiple multithreaded MSparc processors [9]. This paper investigates real-time scheduling algorithms suitable for multithreaded processors and presents performance evaluations on our evaluation testbed-a multithreaded Java microcontroller called Komodo. 2 The proposed Komodo microcontroller The Komodo microcontroller [ 101 is a multithreaded Java microcontroller which supports multiple ISTs with zero-cycle context switching overhead and several priority schemes. Because of its application for embedded systems, the processor core of the Komodo microcontroller is kept at the hardware level of a simple scalar processor. As shown in Fig. 1, the four stage pipelined processor core consists of an instruction-fetch unit, a decode unit, a memory access unit (MEM) and an execution unit (ALU). Four stack register sets are provided on the processor chip. A signal unit triggers IST execution on the occurrence of extemal signals. U t stack register sets t extern signals Figure 1. Block diagram of the Komodo microcontroller The instruction fetch unit holds four program counters (PC) with dedicated status bits (e.g. thread activehpended), each PC is assigned to a different thread. Four byte portions are fetched over the memory interface and put in the according instruction window (IW). Several instructions may be contained in the fetch portion, because of the average bytecode length of 1.8 bytes. Instructions are fetched depending on the fill levels of the IWs, which is sufficient as instruction fetch strategy [ 111. The instruction decode unit contains the above mentioned IWs, dedicated status bits (e.g. priority) and counters. A priority manager decides subject to the bits and counters from which IW the next instruction will be decoded. We define several priority schemes to handle realtime requirements. In detail, we implemented the fixed priority preemptive (FPP), the earliest deadline first (EDF), the least laxity first (LLF), and the guaranteed percentage (GP) scheduling schemes. The priority manager applies one of the implemented thread priority schemes for IW selection. However, latencies may result from branches or memory accesses. To avoid pipeline stalls, instructions from other threads than the highest priority threads can be fed into the pipeline. The decode unit predicts the latency after such an instruction, and proceeds with instructions from other IWs. There is no overhead for such a context switch. No savehestore of registers or removal of instructions from the pipeline is needed, because each thread has it s own stack register set. A bytecode instruction is decoded either to a single micro-op, a sequence of micro-ops, or a trap routine is called. Each opcode is propagated through the pipeline together with its thread id. Opcodes from multiple threads can be simultaneously present in the different pipeline stages. The instructions for memory access are executed by the MEM unit and all other instructions are executed by the ALU unit. Finally, the result is written back to the stack register set of the according thread. External signals are delivered to the signal unit from the peripheral components of the microcontroller core as e.g. timer, counter, or serial interface. By the occurrence of such a signal the corresponding IST is activated. As soon as an IST activation ends its assigned real-time thread is suspended and its status is stored. An external signal may activate the same thread again. In our current implementation, the Komodo microcontroller holds the contexts of up to four threads, which are directly mapped to hardware threads. Three threads may be real-time threads, all remaining threads must be non realtime and are scheduled within the fourth hardware thread. To scale up for larger systems with more than three realtime threads, we propose a parallel execution on several microcontrollers connected by a middleware platform called OSA+ [3]. Because of the unpredictability of cache accesses, a noncached memory access is preferred for real-time microcontrollers. The emerging load latencies are bridged by scheduling instructions of other threads by the priority manager. The Komodo processor is software simulated and hardware implemented on a Xilinx P GA yielding chip-space requirements of about gates for a four-threaded processor kernel [

3 3 Evaluation In the following section we evaluate the time behavior and the latency slot use of real-time scheduling strategies on a multithreaded processor. We examine the four scheduling techniques Earliest Deadline First (EDF), Least Laxity First (LLF), Fixed Priority Preemptive (FPP) and Guaranteed Percentage (GP). For that we choose real application programs which are typical for real-time systems as benchmarks. The first program is a simple impulse counter (IC) which reads data from an interface, scales it and stores it in the memory. The other two programs are a PID-element (PID) and a rather costly Fast Fourier Transform (FFT). Our testbed is the Komodo microcontroller with four hardware threads and a zero-cycle context switch. Latencies from memory accesses and branches are bridged by instructions of other than the high priority thread. In the first part of the evaluation we executed four equal programs on the processor. In this first experiment, all four threads were given the same real-time parameters (deadline = period, starting processor utilization = 0.25 for each thread). Then the common deadline is shortened until the scheduler can't keep them any more. The results of the different schedulers are compared in figure 2. Here the presentation is scaled to a non multithreaded processor, i.e. a value of 1 corresponds to the performance of a processor, that uses no latencies, but needs no additional clock cycles for a context switch as well. More interesting are the PID element and the FFT that yield different speed-ups wrt. the scheduling strategies. The differences are caused by the following behavior that is typical for multithreaded processors: The performance gain of a multithreaded processor arises from the utilization of instruction latencies by switching context to instructions of another thread. To be effective, a pool of executable instructions of different threads must be present. Techniques like FPP or EDF tend to lessen this pool, because first the most urgent thread is executed, then the second most urgent thread, etc. Figure 3 depicts this behavior for an EDF (of FPP) scheduling of four threads with the same code. Let us assume, all four threads start at time zero. Up to time tl the most prior thread is executed with highest priority and the other three threads are ready for execution. To utilize the instruction latencies that arise in the execution of the most prior thread, the processor can switch to instructions of one of the other three threads. However, after the time tl there are just two, after t2 there is just one, and after t3 there is no thread left for the use of latencies arising from the last running thread. LLF and GP perform better than FPP and EDF for the PID and FFT programs. Figure 4 shows for LLF scheduling, that all threads keep executable instructions until all threads terminate simultaneously. However, a frequent number of context switches is induced by the equal deadlines and the permanently changing least laxities. This provides an instruction mix that keeps the threads alive for a maximum of time and so creates optimal conditions for the use of latency slots on the multithreaded processor. GP creates a similar behavior by its frequent context switches. Figure 5 shows the frequency of context switches caused by the different strategies. v1 1,lO 1,oo 0, IC PID FFr Figure 2. Speed-up of the computation times of different schedulers with the same threads The multithreaded processor increases the speed-up for all benchmark programs and scheduling schemes and thus enhances the possible sample rates. All scheduling strategies provide the same speed-up for the impulse counter (IC). This is explained with the extreme shortness of the IC program, that doesn't allow to demonstrate the differences between the scheduling strategies. ~ 'processor utilization = execution time without latency utilization / deadline T1 T2 T3 T4 + context switches I.t 4 :, 3 :2 2 :3 1 threads ready Figure 3. Four equal threads with EDF scheduling From these considerations we conclude as requirements for an optimal real-time scheduler on a multithreaded processor that is able to utilize instruction latencies: The scheduler must sustain each thread as long as possible, i.e. up to its deadline. On condition of a zero cycle context switching overhead, it is a quality factor for a good scheduler that 'The most prior thread executes as on a non multithreaded processor, which allows to compute the worst case execution time as usual. 157

4 T1 T2 T3 T4 - A I!G 80,OO context switches I I I I I I I I I I I I I I I " ' 1 ~ " IIIIIIIIIIIIIII : threads ready Figure 4. Four equal threads with LLF scheduling dl d4 et 1, n 7 1 U 0,8 P 0.6 0, m I FPP EDF LLF GP Figure 6. Speed-ups of the workload with mixed application programs E" 60,OO v1 a Y 50,OO.-! P E 30,00 L 0 20,oo Q s = 0,oo IC PID m Figure 5. Context switches of different schedulers a high number of context switches is caused. Thereby an instruction mix is created which keeps the threads alive as long as possible. The second experiment uses all three programs and an additional non real-time thread. We assume that the deadlines equal the periods and a starting processor utilization of 0.3 for each of the real-time threads is used. We fix the deadlines for the impulse counter and the FFT and shorten the deadlines for the PID element until the first missed deadline occurs. The priorities for FPP are assigned under the terms of rate monotonic analysis. The implementation of GP on the Komodo microcontroller defines three priority classes: exact, minimal, and non real-time. Class exact causes a thread to meet the requested percentage exactly, not more and not less. In case of minimal, a thread gets at least the requested percentage, but it may get more as well. Therefore, in the case of the GP the impulse counter and FFT belongs to the class exact and the PID element is in the class minimal. The start conditions are 30% of execution time for every real-time thread and 10% for the non realtime thread. Figure 6 shows the results of our experiment. It can be seen that again all scheduling algorithms profit from the multithreaded processor. It is remarkable that in this experiment LLF doesn't perform better than FPP or EDF. This can be explained by the mixture of the threads, too. Due to the highly differing execution times of the threads and corresponding deadlines, the thread with the least laxity is the same over a long period. This leads to a similar behavior of LLF and EDF, resulting in nearly the same number of context switches for LLF and EDF. The behavior of the GP scheduler is unexpected. Actually GP should be an ideal scheduler, because the threads in the class exact are held active until the deadline arrives. A thread that needs 10 msec for execution and has a deadline of 40 msec terminates by a share of 25% exactly at the given deadline. The drawback of GP in this experiment can be seen in the current implementation. The scheduler distributes the shares for the threads in intervals of 100 cycles. In each interval, there is a priority to find the next thread. First of all, the threads of the class exact are scheduled in order of the needed cycles. Accordingly the classes minimal and the non real-time threads are taken for execution. Threads that are blocked or in latencies are excluded from the schedule. Figure 7 shows the typical execution sequence of the four given threads. As you can see, after the termination of the two exact threads (after about 60 cycles pending on the usage of the latencies) only the non real-time thread can utilize the latencies of the PID element. Therefore, the non realtime thread gets much more cycles than by LLF of EDF and the performance for handling real-time events goes down. In this case, the number of executable threads always decreases at the end of each interval and even though the number of context switches is high, the mixture of threads is poor. This observation leads to the conclusion, that many context switches are only a hint for a good scheduling algorithm on a multithreaded processor, but not a fact. 0 9 P... *..... Cxnd (FFn CZBLt (IC) p minimal(p1d) - nonrcd-limc 0 60 Im cyclcr Figure 7. Thread execution within an interval Another essential point is the overhead introduced by the 158

5 various scheduling techniques. To reach a zero cycle context switch on a multithreaded processor, the scheduler must decide within a single processor cycle which instruction to issue next. The prototype implementation of the Komodo microcontroller in a FPGA showed that FPP generates the by far smallest implementation cost. Second with similar costs range GP and EDF. The highest implementationcost is introduced by LLF. GP and LLF profit by the ability of fast context switching yielding good performance results when assuming a zero cycle context switching overhead. These strategies produce a high number of context switches which allows an excellent blending of threads and therefore an optimal latency utilization. However, the performance of these strategies deteriorates quickly, when context switching costs increase. 4 Conclusions Multithreaded processors with the ability of very fast context switching offer a new challenge to real-time scheduling policies. First, scheduling strategies like EDF, LLF, and GP may be implemented without thread switching overhead. Second, multithreaded processors may switch the context to another thread to increase performance by utilizing latencies caused by memory access or branch instructions. Latency utilization in a multithreaded processor can increase processor performance over 100% compared to a non-multithreaded processor. However, latency utilization is an additional performance gain that cannot be guaranteed for hard real-time event handling. To efficiently utilize latencies, a pool of executable instructions of different threads is needed. Classical realtime scheduling policies like EDF or FPP tend to thin out this pool by executing instructions of a thread block-wise, the most urgent thread first, then the second urgent, and so on. This produces a minimal number of context switches, which is a good choice on conventional processors. On a multithreaded processor, not enough instructions of ready threads may remain to bridge occurring latencies. A realtime scheduling policy optimal in the sense of bridging latencies on multithreaded processors must keep a thread alive as long as possible. This means, the execution time of a thread must be extended to its deadline. But LLF and GP are still not optimal. LLF thins out the thread pool like EDF or FPP in case of strongly different deadlines. GP may be a candidate, but the current implementation produces the same problem. So an optimal policy still has to be found. The work described in this paper is considered as a basis for further research on real-time scheduling on envisioned future multithreaded processors and microcontrollers. By modifying the well known real-time scheduling policies, the architectural features of such processors can be used more efficiently. References [ 11 Emer, J. Simultaneous Multithreading: Multiplying Alpha speiformance. Microprocessor Forum 1999, San Jose, Ca., Oct [2] L. Gwennap. MAJC Gives VLIW a New Twist. Microprocessor Report, Vol 13, No. 12, pp , September, [3J U. Brinkschulte, C. Krakowski, J. Kreuzinger, R. Marston, and T. Ungerer. The Komodo Project: Thread-Based Event Handling Supported by a Multithreaded Java Microcontrollen 25th EUROMICRO Conference, Milano, September [4] U. Brinkschulte, C. Krakowski, J. Kreuzinger, Th. Ungerer. Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-time Events on a Multithreaded Microcontroller. The 20th IEEE Real-Time Systems Symposium. Phoenix, Arizona, December 1-3, [SI R. S. Chappell, J. Stark, S. P. Kim, S. K. Reinhardt, Y. N. Patt. Simultaneous Subordinate Microthreading (SSMT). ISCA 26 Proceedings, Atlanta, Georgia, Vol 27, No 2, pp , May [6] S. W. Keckler, A. Chang, W. S. Lee, W. J. Dally. Concurrent Event Handling through Multithreading. IEEE Transactions on computers, Vol48, No 9, pp , September 1999 [7] C.B. Zilles, J.S. Emer, G.S. Sohi. The Use of Multithreading for Exception Handling MICRO-32, Haifa, November 1999, [8] K. Liith, A. Metzner, T. Peikenkamp, J. Risau. The EVENTS Approach to Rapid Prototyping for Embedded Control Systems. Zielarchitekturen eingebetteter Systeme, 14. ITG/GI Fachtagung Architektur von Rechnersystemen, Rostock, [9] W. Damm, A. Mikschl. MSPARC: a multithreaded SPARC. Euro-Par 96 Parallel Processings: Second International Euro-Par Conference, Vol 11, LNCS 1124, Springer Verlag [lo] U. Brinkschulte, C. Krakowski, J. Kreuzinger, T. Ungerer. A Multithreaded Java Microcontroller for Thi-ead-oriented Real-time Event-Handling International Conference on Parallel Architectures and Compilation Techniques (PACT 99), Newport Beach, Ca., pp , October [ll] J. Kreuzinger, M. Pfeffer, A. Schulz, T. Ungerer, U. Brinkschulte, C. Krakowski. Performance Evaluations of a Multithreaded Java Microcontroller PDPTA OO, Las Vegas, Nevada, USA, Vol. 1, pp , June [12] J. Kreuzinger, R. Zulauf, A. Schulz, T. Ungerer, M. Pfeffer, U. Brinkschulte, C. Krakowski. Performance Evaluations and Chip-Space Requirements of a Multithreaded Java Microcontroller. The Second Annual Workshop on Hardware Support for Objects and Microarchitectures for Java - in conjunction with ICCD 2000, Austin, Texas, September

Real-time Scheduling on Multithreaded Processors

Real-time Scheduling on Multithreaded Processors Real-time Scheduling on Multithreaded Processors J. Kreuzinger, A. Schulz, M. Pfeffer, Th. Ungerer Institute for Computer Design, and Fault Tolerance University of Karlsruhe D-76128 Karlsruhe, Germany

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control,

More information

A Scheduling Technique Providing a Strict Isolation of Real-time Threads

A Scheduling Technique Providing a Strict Isolation of Real-time Threads A Scheduling Technique Providing a Strict Isolation of Real-time Threads U. Brinkschulte ¾, J. Kreuzinger ½, M. Pfeffer, Th. Ungerer ½ Institute for Computer Design and Fault Tolerance University of Karlsruhe,

More information

Priority manager. I/O access

Priority manager. I/O access Implementing Real-time Scheduling Within a Multithreaded Java Microcontroller S. Uhrig 1, C. Liemke 2, M. Pfeffer 1,J.Becker 2,U.Brinkschulte 3, Th. Ungerer 1 1 Institute for Computer Science, University

More information

A Real-Time Java System on a Multithreaded Java Microcontroller

A Real-Time Java System on a Multithreaded Java Microcontroller A Real-Time Java System on a Multithreaded Java Microcontroller M. Pfeffer, S. Uhrig, Th. Ungerer Institute for Computer Science University of Augsburg D-86159 Augsburg fpfeffer, uhrig, ungererg @informatik.uni-augsburg.de

More information

A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling

A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control, Institute of Computer Design Automation

More information

A Microkernel Architecture for a Highly Scalable Real-Time Middleware

A Microkernel Architecture for a Highly Scalable Real-Time Middleware A Microkernel Architecture for a Highly Scalable Real-Time Middleware U. Brinkschulte, C. Krakowski,. Riemschneider. Kreuzinger, M. Pfeffer, T. Ungerer Institute of Process Control, Institute of Computer

More information

Real-time Garbage Collection for a Multithreaded Java Microcontroller

Real-time Garbage Collection for a Multithreaded Java Microcontroller Real-time Garbage Collection for a Multithreaded Java Microcontroller S. Fuhrmann, M. Pfeffer, J. Kreuzinger, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe D-76128

More information

The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller

The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller J. Kreuzinger, R. Marston, Th. Ungerer Dept. of Computer Design and Fault Tolerance University of Karlsruhe

More information

CARUSO Project Goals and Principal Approach

CARUSO Project Goals and Principal Approach CARUSO Project Goals and Principal Approach Uwe Brinkschulte *, Jürgen Becker #, Klaus Dorfmüller-Ulhaas +, Ralf König #, Sascha Uhrig +, and Theo Ungerer + * Department of Computer Science, University

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

Multithreaded Architectural Support for Speculative Trace Scheduling in VLIW Processors

Multithreaded Architectural Support for Speculative Trace Scheduling in VLIW Processors Multithreaded Architectural Support for Speculative Trace Scheduling in VLIW Processors Manvi Agarwal and S.K. Nandy CADL, SERC, Indian Institute of Science, Bangalore, INDIA {manvi@rishi.,nandy@}serc.iisc.ernet.in

More information

Evaluation of Branch Prediction Strategies

Evaluation of Branch Prediction Strategies 1 Evaluation of Branch Prediction Strategies Anvita Patel, Parneet Kaur, Saie Saraf Department of Electrical and Computer Engineering Rutgers University 2 CONTENTS I Introduction 4 II Related Work 6 III

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Design and Implementation of a FPGA-based Pipelined Microcontroller

Design and Implementation of a FPGA-based Pipelined Microcontroller Design and Implementation of a FPGA-based Pipelined Microcontroller Rainer Bermbach, Martin Kupfer University of Applied Sciences Braunschweig / Wolfenbüttel Germany Embedded World 2009, Nürnberg, 03.03.09

More information

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA

More information

The Use of Multithreading for Exception Handling

The Use of Multithreading for Exception Handling The Use of Multithreading for Exception Handling Craig Zilles, Joel Emer*, Guri Sohi University of Wisconsin - Madison *Compaq - Alpha Development Group International Symposium on Microarchitecture - 32

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

ECE404 Term Project Sentinel Thread

ECE404 Term Project Sentinel Thread ECE404 Term Project Sentinel Thread Alok Garg Department of Electrical and Computer Engineering, University of Rochester 1 Introduction Performance degrading events like branch mispredictions and cache

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Supporting Multithreading in Configurable Soft Processor Cores

Supporting Multithreading in Configurable Soft Processor Cores Supporting Multithreading in Configurable Soft Processor Cores Roger Moussali, Nabil Ghanem, and Mazen A. R. Saghir Department of Electrical and Computer Engineering American University of Beirut P.O.

More information

One-Level Cache Memory Design for Scalable SMT Architectures

One-Level Cache Memory Design for Scalable SMT Architectures One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract

More information

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS Computer types: - COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS A computer can be defined as a fast electronic calculating machine that accepts the (data) digitized input information process

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 06: Multithreaded Processors Objective To learn meaning of thread To understand multithreaded processors,

More information

Simultaneous Multithreading Blending Thread-level and Instruction-level Parallelism in Advanced Microprocessors

Simultaneous Multithreading Blending Thread-level and Instruction-level Parallelism in Advanced Microprocessors Simultaneous Multithreading Blending Thread-level and Instruction-level Parallelism in Advanced Microprocessors JURIJ ŠILC BORUT ROBIČ THEO UNGERER Computer Systems Department Faculty of Computer and Information

More information

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

The Design Complexity of Program Undo Support in a General-Purpose Processor

The Design Complexity of Program Undo Support in a General-Purpose Processor The Design Complexity of Program Undo Support in a General-Purpose Processor Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

EE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May Branch Prediction

EE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May Branch Prediction EE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May 2000 Lecture #3: Wednesday, 5 April 2000 Lecturer: Mattan Erez Scribe: Mahesh Madhav Branch Prediction

More information

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract

More information

Multithreading and the Tera MTA. Multithreading for Latency Tolerance

Multithreading and the Tera MTA. Multithreading for Latency Tolerance Multithreading and the Tera MTA Krste Asanovic krste@lcs.mit.edu http://www.cag.lcs.mit.edu/6.893-f2000/ 6.893: Advanced VLSI Computer Architecture, October 31, 2000, Lecture 6, Slide 1. Krste Asanovic

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

DDM-CMP: Data-Driven Multithreading on a Chip Multiprocessor

DDM-CMP: Data-Driven Multithreading on a Chip Multiprocessor DDM-CMP: Data-Driven Multithreading on a Chip Multiprocessor Kyriakos Stavrou, Paraskevas Evripidou, and Pedro Trancoso Department of Computer Science, University of Cyprus, 75 Kallipoleos Ave., P.O.Box

More information

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable What s An OS? Provides environment for executing programs Process abstraction for multitasking/concurrency scheduling Hardware abstraction layer (device drivers) File systems Communication Do we need an

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

Types of Interrupts:

Types of Interrupts: Interrupt structure Introduction Interrupt is signals send by an external device to the processor, to request the processor to perform a particular task or work. Mainly in the microprocessor based system

More information

Pull based Migration of Real-Time Tasks in Multi-Core Processors

Pull based Migration of Real-Time Tasks in Multi-Core Processors Pull based Migration of Real-Time Tasks in Multi-Core Processors 1. Problem Description The complexity of uniprocessor design attempting to extract instruction level parallelism has motivated the computer

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Synthetic Benchmark Generator for the MOLEN Processor

Synthetic Benchmark Generator for the MOLEN Processor Synthetic Benchmark Generator for the MOLEN Processor Stephan Wong, Guanzhou Luo, and Sorin Cotofana Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology,

More information

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

2. List the five interrupt pins available in INTR, TRAP, RST 7.5, RST 6.5, RST 5.5.

2. List the five interrupt pins available in INTR, TRAP, RST 7.5, RST 6.5, RST 5.5. DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EE6502- MICROPROCESSORS AND MICROCONTROLLERS UNIT I: 8085 PROCESSOR PART A 1. What is the need for ALE signal in

More information

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers Course Introduction Purpose: This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers Objectives: Learn about error detection and address errors

More information

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 (Spl.) Sep 2012 42-47 TJPRC Pvt. Ltd., VLSI DESIGN OF

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Abstract The proposed work is the design of a 32 bit RISC (Reduced Instruction Set Computer) processor. The design

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS Christian Ferdinand and Reinhold Heckmann AbsInt Angewandte Informatik GmbH, Stuhlsatzenhausweg 69, D-66123 Saarbrucken, Germany info@absint.com

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors

MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors MPEG- Video Decompression on Simultaneous Multithreaded Multimedia Processors Heiko Oehring Ulrich Sigmund Theo Ungerer VIONA Development GmbH Karlstr. 7 D-733 Karlsruhe, Germany uli@viona.de VIONA Development

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin Product Bulletin TM DSP/BIOS Kernel Scalable, Real-Time Kernel TM for TMS320 DSPs Key Features: Fast, deterministic real-time kernel Scalable to very small footprint Tight integration with Code Composer

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors

On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors Ulrich Sigmund, Marc Steinhaus, and Theo Ungerer VIONA Development GmbH, Karlstr.

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU Structure and Function Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU must: CPU Function Fetch instructions Interpret/decode instructions Fetch data Process data

More information

A Modified Maximum Urgency First Scheduling Algorithm for Real-Time Tasks

A Modified Maximum Urgency First Scheduling Algorithm for Real-Time Tasks Vol:, o:9, 2007 A Modified Maximum Urgency irst Scheduling Algorithm for Real-Time Tasks Vahid Salmani, Saman Taghavi Zargar, and Mahmoud aghibzadeh International Science Index, Computer and Information

More information

Multiprocessor and Real- Time Scheduling. Chapter 10

Multiprocessor and Real- Time Scheduling. Chapter 10 Multiprocessor and Real- Time Scheduling Chapter 10 Classifications of Multiprocessor Loosely coupled multiprocessor each processor has its own memory and I/O channels Functionally specialized processors

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Code Compression for DSP

Code Compression for DSP Code for DSP Charles Lefurgy and Trevor Mudge {lefurgy,tnm}@eecs.umich.edu EECS Department, University of Michigan 1301 Beal Ave., Ann Arbor, MI 48109-2122 http://www.eecs.umich.edu/~tnm/compress Abstract

More information

MICROPROCESSOR AND MICROCONTROLLER BASED SYSTEMS

MICROPROCESSOR AND MICROCONTROLLER BASED SYSTEMS MICROPROCESSOR AND MICROCONTROLLER BASED SYSTEMS UNIT I INTRODUCTION TO 8085 8085 Microprocessor - Architecture and its operation, Concept of instruction execution and timing diagrams, fundamentals of

More information

An Efficient Approach to Energy Saving in Microcontrollers

An Efficient Approach to Energy Saving in Microcontrollers An Efficient Approach to Energy Saving in Microcontrollers Wenhong Zhao 1 and Feng Xia 2 1 Precision Engineering Laboratory, Zhejiang University of Technology, Hangzhou 310014, China wenhongzhao@gmail.com

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085.

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085. (1) Draw and explain the internal architecture of 8085. The architecture of 8085 Microprocessor is shown in figure given below. The internal architecture of 8085 includes following section ALU-Arithmetic

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

A Low Power and High Speed MPSOC Architecture for Reconfigurable Application

A Low Power and High Speed MPSOC Architecture for Reconfigurable Application ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

ACCELERATING 2D GRAPHIC APPLICATIONS WITH LOW ENERGY OVERHEAD

ACCELERATING 2D GRAPHIC APPLICATIONS WITH LOW ENERGY OVERHEAD ACCELERATING 2D GRAPHIC APPLICATIONS WITH LOW ENERGY OVERHEAD Oliveira, L.; Neves, B; Carro, L Instituto de Informática Universidade Federal do Rio Grande do Sul {loliveira,bsneves,carro}@inf.ufrgs.br

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

Real-Time Scheduling of Sensor-Based Control Systems

Real-Time Scheduling of Sensor-Based Control Systems In Proceedings of Eighth IEEE Workshop on Real-Time Operatings Systems and Software, in conjunction with 7th IFAC/IFIP Workshop on Real-Time Programming, Atlanta, GA, pp. 44-50, May 99. Real-Time Scheduling

More information

Department of Electrical and Computer Engineering The University of Texas at Austin. Problem 1 (12 points): Problem 2 (12 points):

Department of Electrical and Computer Engineering The University of Texas at Austin. Problem 1 (12 points): Problem 2 (12 points): Department of Electrical and Computer Engineering The University of Texas at Austin EE 382N, Spring 2012 Y. N. Patt, Instructor Veynu Narasiman and Carlos Villavieja, TAs Exam 1, March 28, 2012 Name :

More information

PC Interrupt Structure and 8259 DMA Controllers

PC Interrupt Structure and 8259 DMA Controllers ELEC 379 : DESIGN OF DIGITAL AND MICROCOMPUTER SYSTEMS 1998/99 WINTER SESSION, TERM 2 PC Interrupt Structure and 8259 DMA Controllers This lecture covers the use of interrupts and the vectored interrupt

More information

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Performance Characteristics. i960 CA SuperScalar Microprocessor

Performance Characteristics. i960 CA SuperScalar Microprocessor Performance Characteristics of the i960 CA SuperScalar Microprocessor s. McGeady Intel Corporation Embedded Microprocessor Focus Group i960 CA - History A highly-integrated microprocessor for embedded

More information