Priority manager. I/O access

Size: px
Start display at page:

Download "Priority manager. I/O access"

Transcription

1 Implementing Real-time Scheduling Within a Multithreaded Java Microcontroller S. Uhrig 1, C. Liemke 2, M. Pfeffer 1,J.Becker 2,U.Brinkschulte 3, Th. Ungerer 1 1 Institute for Computer Science, University of Augsburg, Augsburg, Germany fuhrig, pfeffer, ungererg@informatik.uni-augsburg.de, 2 Institute for Information Processing Technology University of Karlsruhe, Karlsruhe, Germany becker@itiv.uni-karlsruhe.de 3 Institute for Process Control, Automation and Robotics University of Karlsruhe, Karlsruhe, Germany brinks@ira.uka.de Abstract This paper presents the design, evaluation and hardware implementation of real-time scheduling schemes, which are embedded in a multithreaded Java microcontroller. We show the feasibility of a hardware real-time scheduler integrated deeply into the processor pipeline with a VHDL design and its synthesis. Evaluations with a software simulator and real-time applications as benchmarks show that hardware multithreading reaches a 1.2 to 1.6 performance increase for hard real-time applications (multithreading without latency utilization) and a 1.8 to 2.6 speedup by latency utilization for programs without hard real-time requirements. We also show that even for the complex scheduling algorithms EDF (Earliest Deadline First), LLF (Least Laxity First), and GP (Guaranteed Percentage) a scheduling decision is possible within one processor cycle of a 327 MHz, 325 MHz, resp. 274 MHz processor with four threads. With respect to real-time scheduling on a multithreaded microcontroller, the LLF scheme outperforms the FPP (Fixed Priority Preemptive), EDF, and GP schemes. However, only GP allows isolation of threads. Keywords: real-time Java, real-time scheduling, embedded systems, multithreading 1 Introduction The target market of our project is the widespread market of embedded systems, in particular, embedded real-time systems. In this area microcontrollers are typically preferred over general-purpose processors because of their on-

2 chip integration of RAM and peripheral interfaces, resulting in smaller and cheaper hardware. Requirements for microcontroller design concern besides the execution performance in particular support for real-time event handling and flexible real-time scheduling strategies, rapid context switching ability, and small memory requirements. Hard real-time events are never allowed to miss their deadlines. To guarantee the handling of hard real-time events in time, the runtime of the event-handling algorithm must be countable in processor cycles. A multithreaded processor is able to pursue multiple threads of control in parallel within the processor pipeline. The functional units are multiplexed between the thread contexts. Most approaches store the thread contexts in different register sets on the processor chip. Latencies that arise by cache misses, long running operations or other pipeline hazards are masked by switching to another thread. Thread scheduling has been proposed to optimize the throughput of simultaneousmultithreaded processors (SMT) [12, 9] and to scheduling soft real-time applications in SMT processors [5]. The EVENTS project [8] introduces thread scheduling for event handling on multithreaded processors by an external hardware scheduler. A simple kind of thread scheduling for latency bridging in real-time environments is the round-robin scheduling scheme used in [4]. The Komodo project explores the suitability of hardware multithreading techniques in embedded real-time systems on the basis of a microcontroller, called Komodo microcontroller [2]. Key features of the Komodo microcontroller are the ability of very rapid context switching and the real-time scheduling algorithms integrated deeply within the pipeline [6]. So we propose hardware multithreading as an event handling mechanism that allows efficient handling of simultaneous overlapping events with hard real-time requirements. We design a microcontroller with a multithreaded processor core that allows to trigger so-called Interrupt-Service- Threads (ISTs) instead of Interrupt-Service- Routines (ISRs) for event handling [3]. The basic idea of the IST concept for event handling is that an occurring event activates an assigned thread instead of an ISR as it is done by conventional processors and microcontrollers. The IST concept activates threads directly in hardware. ISTs are directly mapped to the thread slots of the Komodo processor core. Execution of a thread is triggered by an external hardware event. The required real-time scheduling algorithms are embedded within the processor pipeline. In the following we present the design of the Komodo microcontroller core and focus in particular on the implementation of the realtime scheduling schemes in hardware. The next section describes the implemented real-time scheduling algorithms. Section 3 presents the pipeline core of the Komodo microcontroller, section 4 the implementation of the hardware scheduling algorithms, and section 5 the evaluation. Section 6 concludes the paper. 2 Real-Time Scheduling Algorithms Applying a multithreaded processor eliminates the latencies of IST activation and context switching and allows an additional optimization: the scheduling can be done by hardware. This avoids a software scheduler call after an IST activation and allows the immediate processing of 2

3 an occurring event. However, the scheduling scheme must be implemented in hardware and the hardware scheduler should provide a scheduling decision within one clock cycle. The following real-time scheduling schemes are adapted to the needs of a multithreaded microcontroller and implemented in the Komodo processor core: The Fixed Priority Preemptive (FPP) scheme assigns a fixed constant priority toeach thread. The processor always executes the thread with the highest priority among all active threads. The Earliest Deadline First (EDF) scheme [7] executes the thread closest to its deadline. Therefore, the only necessary parameter for this scheme is the deadline. Stankovic et al. [11] show that EDF is an optimal scheme for periodic threads on a single processor system. It guarantees all deadlines up to 100% processor utilization. The Least Laxity First (LLF) scheme can be considered as an extension to the Earliest Deadline First scheme. Additionally to the deadline, the execution time of each thread is used to calculate its laxity. The laxity is the difference between the remaining time to the deadline and the remaining execution time of a thread. The thread with the least laxity gets the processor. Guaranteed Percentage (GP) [1] is a scheme that has been newly designed for real-time scheduling on multithreaded processors. The basic idea is to statically assign percentages of the available processor time to the threads and to guarantee these percentages in short time intervals. This ensures a definite and predictable proceeding of the threads providing isolation of realtime event-handling threads against each other. A thread cannot harm the timing behavior of any other thread. Such an isolation has two advantages over conventional microcontrollers: multiple hard real-time events can be processed by a single microcontroller and real-time threads can be removed or replaced without affecting the behavior of the remaining threads in the system. So real-time constraints can be kept even during dynamic reconfiguration. Due to its many context switches, the GP scheme is only suitable within a multithreaded processor core with a single cycle context switching overhead. 3 The Komodo Microcontroller The Komodo microcontroller consists of a processor core attached to typical controllers as e.g. a timer/counter, capture/compare, serial and parallel interfaces via an I/O bus [10]. In the following we focus on the processor core [2], which is a multithreaded Java processor with a fourstage pipeline. Because of its application in embedded systems, the processor core of the Komodo microcontroller is kept at a simple hardware level. Figure 1 shows the multithreaded pipeline enhanced by the priority manager and the signal unit. The pipeline consists of the following four pipeline stages: instruction fetch (IF), instruction decode (ID), operand fetch (OF), and execute/memory/io access (EXE). These four stages perform the following tasks: Instruction fetch: If not all instruction windows (IWs) are full, the IF tries to fetch a new instruction package from the memory interface. A successfully fetched instruction package will be routed to the corresponding instruction window. A fetch is not successful, if a memory access occurs at the same time. Each instruction package consists of four bytes. Because of the variable length of bytecode, each package contains from zero up to four bytecodes. Instruction decode: The decoding of instruc- 3

4 Memory interface Address Data Address Instructions µrom Memory access Stack register set set1 Instruction fetch PC1 PC2 PC3 PC4 IW1 IW2 IW3 IW4 Instruction decode Operand fetch Stack register set set2 Execute Stack register set set3 Stack register set set4 Priority manager I/O access Figure 1: The Komodo processor core Signal unit Address tions will be started after writing a received instruction package in the corresponding IW. Each IW is organized as an 8 byte long transparent FIFO buffer. Every cycle, the priority manager decides which thread will be decoded next. The decoding results in a hardware instruction, executed directly in the execution stage, or it starts a sequence of microcodes. Very complex instructions are executed by trap routines. After termination of a microcode sequence, a new instruction will be decoded. The design of the microcode unit allows an interleaving of microcode instructions with instructions from other threads. Operand fetch: In this pipeline stage, the operands needed by the actual operation are read from the stack. Because of the stack architecture of Java, a lot of data dependencies occur. To manage this problem without adding latencies, data forwarding has been integrated. That allows result forwarding from the execution stage's output and from memory accesses directly to the input latches of the execution stage. Data I/O components Execution, memory and I/O access: The execution stage is responsible for all instructions except of load/store instructions. The execution stage uses the given operands for executing the operation submitted by the decode stage. The result is sent to the stack and the operand fetch unit for forwarding. An explicit write back stage is not necessary. In the case of a load/store instruction, the memory is addressed by one of the operands. Address calculation is performed by software. Because of the usage of only physical addresses, no additional calculation by hardware is necessary. That means the whole execution cycle is available for the memory access. An I/O access is handled in the same way like a memory access. One of the main improvements in comparison to other simple pipelines is the context switching overhead of zero cycles. Such a fast context switch needs the ability of executing different threads within the pipeline. Therefore a thread tag is routed from one pipeline stage to the next. The tag indicates the thread to which the currently transmitted signals belong. These thread tags establish a chain of tags propagted through the pipeline. The origin of this chain is the priority manager shown in the next section. 4 The Priority Manager The priority manager (PM) is responsible for hardware real-time scheduling of the Interrupt Service Threads. Up to now, we didn't find any work that integrates modern real-time scheduling algorithms in a processor pipeline. Four different priority manager implementations supporting the scheduling algorithms FPP, EDF, LLF and GP were investigated. In spite of the different algorithms, the four implementations 4

5 Generating Determining Actualizing PrioValue PrioValue PrioValue PrioValue < < < Thread tag PrioValue PrioValue PrioValue PrioValue Figure 2: Implementation of the priority manager Characteristic value: latency indicator not ( IW full ) PrioValue not ( waiting for atom lock ) not ( active ) Figure 3: Composition of the characteristic value are very similar. Targeting a possible context switch every clock cycle, the priority manager has to perform a scheduling decision each cycle. The main procedure of the priority manager is split up into three parts: Figure 2 shows the three phases of the PM. In the first step, a characteristic value (PrioValue in fig. 2) is generated for each hardware thread. In the second step, these values are compared in a comparison tree to determine which thread's instruction has to be executed. The last step updates the characteristic value of each thread depending on the scheduling decision and algorithm. Figure 3 shows the composition of the characteristic value. The upper four bits are independent from the chosen scheduling algorithm. These bits present the thread's state. In particular, these bits indicate the activity of the thread, the state while waiting for an atomic lock, and if there are latencies due to the last executed instruction. Also indicated is the fact if the corresponding instruction window contains enough bytes for a complete instruction. Because the comparison tree is looking for the lowest value, all these bits, except the latency indicator, have to be inverted. Threads with latencies, inactive threads, threads waiting for an atomic lock or threads with empty instruction windows get the lowest priority. The rest of the characteristic value, the PrioValue depends on the chosen scheduling scheme and is defined as following: FPP: The PrioValue is the fixed priority of the thread, as stated by the programmer. In the case of four threads four possible priority levels are sufficient, therefore the width of PrioValue is 2 bit. EDF: By activating a thread, its deadline is stored in the PrioValue. By comparing these values, the PM determines the thread with the lowest PrioValue that means the thread with the nearest deadline. During the third step of the PM, all PrioValues are decremented because each thread gets closer to the corresponding deadline. The width of the PrioValue depends on the maximum deadline length. LLF: This algorithm is very similar to EDF. The difference is an additional value, the runtime of each thread. The PrioValue is given by the difference between deadline and runtime. This difference is called laxity. The runtime has the same width as the deadline and is decremented each time the corresponding thread is decoded. GP: By entering a new interval, each PrioValue of the loaded threads is initialized with the amount of cycles given by the GP parameter: We chose an interval length of 100 cy- 5

6 cles, which allows to load PrioValue simply with the percentage. During the third step, the PM decrements the PrioValue of the actual determined thread. Additionally the PrioValues of all threads have a latency greater than 0, are decremented, because in the analytical model these threads are in the state of execution. By integrating the PM into the decode stage, the whole task of the decode stage can be divided into six steps: 1. Storing the instruction package into the IW 2. Generating the characteristic thread values 3. Determining the thread tag for decoding 4. Actualizing of PrioValue 5. Decoding the instruction 6. Actualizing of the IW After the number of bytes in the IWs is calculated, the execution of the second step can be overlapped with the first step. The steps five and six can be executed in parallel to step four. 5 Evaluations We developed a software simulator of the Komodo processor core for performance estimation, a FPGA and an ASIC prototype of the whole Komodo microcontroller. In the following we present results of the software evaluations and of an ASIC-directed synthesis of the different scheduling algorithms to assess signal runtimes and chip area requirements. We use three real-time applications as benchmarks: an impulse counter (IC), a PID element (Proportional, Integral, Derivative element) and a FFT algorithm (FFT). These benchmarks are programmed in Java and compiled to Java bytecode. Latency assumptions are three cycles for branches and two cycles for memory transfers gain 3,00 2,50 2,00 1,50 1,00 0,50 0,00 Baseline Multithreading, without latency hiding Multithreading, with latency hiding FPP EDF GP LLF Figure 4: Measurements with the DifMix benchmark and for writes to special registers. Figure 4 shows the results of our measurements using one IC, one FFT and two PID threads in the four thread slots (DifMix benchmark). The deadlines of the threads were shorten until a deadline miss occurs. As baseline processor, we chose a model of the singlethreaded picojava-ii with an assumed context switching overhead of 100 cycles. The baseline processor in figure 4 has no ability to hide latencies and assumes context switching time of 100 cycles. The multithreaded model with no context switching costs, but without the ability to use latencies is dedicated towards hard realtime applications, and the multithreading with latency hiding model allows to speed up soft realtime applications. All results are normalized to the baseline FPP version. The measurements of the multithreaded version without latency bridging is important for hard real-time environments because the amount of utilizable latencies completely depends on software. So we cannot guarantee latency-utilization but if there are any latencies available, they will be bridged by executing other threads. A performance increase of 1.2 to 1.6 is reached 6

7 4 threads 8 threads 16 threads MHz size MHz size MHz size [mm 2 ] [mm 2 ] [mm 2 ] FPP EDF LLF GP Table 1: Run Times and Sizes using the UMC18 Technology for hard real-time applications due to multithreading and the resulting fast context switching. A further performance gain of 1.8 to 2.6 is reached for soft or non real-time applications by latency hiding (for further simulation results see [6]). Next step was a synthesis of the different scheduling algorithms using the DesignCompiler from Synopsys and the UMC18 library from Virtual Silicon for a 0.18 micron ASIC technology leading to the results of clock frequency and size shown in table 1. These measurements were made with priority managers supporting 4, 8 or 16 threads. With view to the reached frequencies, we show the feasibility of using the priority manager within state of the art microcontroller systems. 6 Conclusions This paper presents a Java based real-time multithreaded microcontroller. We base our Interrupt Service Thread (IST) concept on the idea to handle events by threads utilizing the fast context switching of multithreaded processors. Up to now such processors have been designed for latency hiding and throughput increase. In contrast, our Komodo microcontroller core applies hardware multithreading for fast real-time event handling. Moreover we investigated the behavior of real-time scheduling in combination with the multithreaded processor technique. Because the Komodo microcontroller performs a context switch without any switching overhead, we implemented several well-known scheduling techniques in hardware (FPP, EDF, LLF, and GP). We showed the feasibility of a hardware realtime scheduler integrated deeply into the processor pipeline with a VHDL design and its synthesis. Our evaluations show a performance increase of 1.2 to 1.6 for hard real-time applications due to the fast context switch ability of multithreading and a 1.8 to 2.6 speedup for soft or non real-time applications by latency hiding. We also show that even for the complex scheduling algorithms EDF, LLF, and GP a scheduling decision is possible within one processor cycle of a 327 MHz, 325 MHz, resp. 274 MHz processor with four threads. With respect to realtime scheduling on a multithreaded microcontroller, the LLF (Least Laxity First) scheme outperforms the FPP (Fixed Priority Preemptive), EDF (Earliest Deadline First), and GP (Guaranteed Percentage) schemes. Only GP allows isolation of threads. The next step is to redesign the Komodo microcontroller with the aim to reduce power consumption and to implement it as an ASIC prototype. The microcontroller will be applied to control an autonomous guided vehicle to test it in an industrial environment. References [1] U. Brinkschulte, J. Kreuzinger, M. Pfeffer, and Th. Ungerer. A Scheduling Technique Providing a Strict Isolation of Realtime Threads. In Seventh IEEE Interna- 7

8 tional Workshop on Object-oriented Realtime Dependable Systems (WORDS), San Diego, CA, January [2] Uwe Brinkschulte, C. Krakowski, J. Kreuzinger, and Th. Ungerer. A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event- Handling. In International Conference on Parallel Architectures and Compilation Techniques (PACT 99), Newport Beach, pages 34 39, October [3] Uwe Brinkschulte, C. Krakowski, J. Kreuzinger, and Th. Ungerer. Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller. In RTSS WIP sessions, Phoenix, pages 11 15, December [4] Bryce Cogswell and Zary Segall. MACS: A Predictable Architecture for Real Time Systems. In IEEE Real-Time Systems Symposium, pages , [5] R. Jain, Ch. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In 23rd IEEE International Real-Time Systems Symposium, December [6] Jochen Kreuzinger, A. Schulz, M. Pfeffer, Th. Ungerer, U. Brinkschulte, and C. Krakowski. Real-time Scheduling on Multithreaded Processors. In The 7th International Conference on Real-Time Computing Systems and Applications (RTCSA 2000), Cheju Island, South Korea, pages , December [7] C. L. Liu and James W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. Journal of the ACM, 20(1):46 61, [8] K. Lüth, A. Metzner, T. Peikenkamp, and J. Risau. The EVENTS Approach to Rapid Prototyping for Embedded Control Systems. In Zielarchitekturen eingebetteter Systeme, 14. ITG/GI Fachtagung Architektur von Rechnersystemen, Rostock, pages 45 54, September [9] S. Raasch and S. Reinhardt. Applications of thread prioritization in smt processors. In Proceedings of 1999 Multithreaded Execution, Architecture and Compilation Workshop (MTEAC), January [10] M. Pfeffer Th. Ungerer S. Uhrig, U. Brinkschulte. Connecting peripheral interfaces to a multithreaded java microcontroller. In Workshop on Java in Embedded Systems, ARCS 2002, Karlsruhe, April [11] J. A. Stankovic, M. Spuri, K. Ramamritham, and G.C. Buttazzo. Deadline Scheduling for Real-Time Systems: EDF and Related Algorithms. Kluwer Academic Publishers, Dordrecht Norwell, [12] Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In The 23rd International Symposium on Computer Architecture (ISCA), Philadelphia, Pennsylvania, pages , May

Real-time Scheduling on Multithreaded Processors

Real-time Scheduling on Multithreaded Processors Real-time Scheduling on Multithreaded Processors J. Kreuzinger, A. Schulz, M. Pfeffer, Th. Ungerer Institute for Computer Design, and Fault Tolerance University of Karlsruhe D-76128 Karlsruhe, Germany

More information

Real-time Scheduling on Multithreaded Processors

Real-time Scheduling on Multithreaded Processors Real-time Scheduling on Multithreaded Processors J. Kreuzinger, A. Schulz, M. Pfeffer, Th. Ungerer U. Brinkschulte, C. Krakowski Institute for Computer Design, Institute for Process Control, and Fault

More information

A Real-Time Java System on a Multithreaded Java Microcontroller

A Real-Time Java System on a Multithreaded Java Microcontroller A Real-Time Java System on a Multithreaded Java Microcontroller M. Pfeffer, S. Uhrig, Th. Ungerer Institute for Computer Science University of Augsburg D-86159 Augsburg fpfeffer, uhrig, ungererg @informatik.uni-augsburg.de

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

A Scheduling Technique Providing a Strict Isolation of Real-time Threads

A Scheduling Technique Providing a Strict Isolation of Real-time Threads A Scheduling Technique Providing a Strict Isolation of Real-time Threads U. Brinkschulte ¾, J. Kreuzinger ½, M. Pfeffer, Th. Ungerer ½ Institute for Computer Design and Fault Tolerance University of Karlsruhe,

More information

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control,

More information

CARUSO Project Goals and Principal Approach

CARUSO Project Goals and Principal Approach CARUSO Project Goals and Principal Approach Uwe Brinkschulte *, Jürgen Becker #, Klaus Dorfmüller-Ulhaas +, Ralf König #, Sascha Uhrig +, and Theo Ungerer + * Department of Computer Science, University

More information

A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling

A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling A Multithreaded Java Microcontroller for Thread-Oriented Real-Time Event-Handling U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control, Institute of Computer Design Automation

More information

A Microkernel Architecture for a Highly Scalable Real-Time Middleware

A Microkernel Architecture for a Highly Scalable Real-Time Middleware A Microkernel Architecture for a Highly Scalable Real-Time Middleware U. Brinkschulte, C. Krakowski,. Riemschneider. Kreuzinger, M. Pfeffer, T. Ungerer Institute of Process Control, Institute of Computer

More information

Real-time Garbage Collection for a Multithreaded Java Microcontroller

Real-time Garbage Collection for a Multithreaded Java Microcontroller Real-time Garbage Collection for a Multithreaded Java Microcontroller S. Fuhrmann, M. Pfeffer, J. Kreuzinger, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe D-76128

More information

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul

More information

Simultaneous Multithreading: a Platform for Next Generation Processors

Simultaneous Multithreading: a Platform for Next Generation Processors Simultaneous Multithreading: a Platform for Next Generation Processors Paulo Alexandre Vilarinho Assis Departamento de Informática, Universidade do Minho 4710 057 Braga, Portugal paulo.assis@bragatel.pt

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Multithreading and the Tera MTA. Multithreading for Latency Tolerance

Multithreading and the Tera MTA. Multithreading for Latency Tolerance Multithreading and the Tera MTA Krste Asanovic krste@lcs.mit.edu http://www.cag.lcs.mit.edu/6.893-f2000/ 6.893: Advanced VLSI Computer Architecture, October 31, 2000, Lecture 6, Slide 1. Krste Asanovic

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Design and Implementation of a FPGA-based Pipelined Microcontroller

Design and Implementation of a FPGA-based Pipelined Microcontroller Design and Implementation of a FPGA-based Pipelined Microcontroller Rainer Bermbach, Martin Kupfer University of Applied Sciences Braunschweig / Wolfenbüttel Germany Embedded World 2009, Nürnberg, 03.03.09

More information

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading CS 152 Computer Architecture and Engineering Lecture 18: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller

The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller The Komodo Project: Thread-based Event Handling Supported by a Multithreaded Java Microcontroller J. Kreuzinger, R. Marston, Th. Ungerer Dept. of Computer Design and Fault Tolerance University of Karlsruhe

More information

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji

Beyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of

More information

How to Enhance a Superscalar Processor to Provide Hard Real-Time Capable In-Order SMT

How to Enhance a Superscalar Processor to Provide Hard Real-Time Capable In-Order SMT How to Enhance a Superscalar Processor to Provide Hard Real-Time Capable In-Order SMT Jörg Mische, Irakli Guliashvili, Sascha Uhrig, and Theo Ungerer Institute of Computer Science University of Augsburg

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3

More information

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview

More information

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors

MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors MPEG- Video Decompression on Simultaneous Multithreaded Multimedia Processors Heiko Oehring Ulrich Sigmund Theo Ungerer VIONA Development GmbH Karlstr. 7 D-733 Karlsruhe, Germany uli@viona.de VIONA Development

More information

Supporting Multithreading in Configurable Soft Processor Cores

Supporting Multithreading in Configurable Soft Processor Cores Supporting Multithreading in Configurable Soft Processor Cores Roger Moussali, Nabil Ghanem, and Mazen A. R. Saghir Department of Electrical and Computer Engineering American University of Beirut P.O.

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

ECE 3055: Final Exam

ECE 3055: Final Exam ECE 3055: Final Exam Instructions: You have 2 hours and 50 minutes to complete this quiz. The quiz is closed book and closed notes, except for one 8.5 x 11 sheet. No calculators are allowed. Multiple Choice

More information

One-Level Cache Memory Design for Scalable SMT Architectures

One-Level Cache Memory Design for Scalable SMT Architectures One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors

On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors On Performance, Transistor Count and Chip Space Assessment of Multimediaenhanced Simultaneous Multithreaded Processors Ulrich Sigmund, Marc Steinhaus, and Theo Ungerer VIONA Development GmbH, Karlstr.

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

A Prototype Multithreaded Associative SIMD Processor

A Prototype Multithreaded Associative SIMD Processor A Prototype Multithreaded Associative SIMD Processor Kevin Schaffer and Robert A. Walker Department of Computer Science Kent State University Kent, Ohio 44242 {kschaffe, walker}@cs.kent.edu Abstract The

More information

A Predictable Simultaneous Multithreading Scheme for Hard Real-Time

A Predictable Simultaneous Multithreading Scheme for Hard Real-Time A Predictable Simultaneous Multithreading Scheme for Hard Real-Time Jonathan Barre, Christine Rochange, and Pascal Sainrat Institut de Recherche en Informatique de Toulouse, Université detoulouse-cnrs,france

More information

Embedded Systems: OS

Embedded Systems: OS Embedded Systems: OS Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu) Standalone

More information

Effects of Hard Real-Time Constraints in Implementing the Myopic Scheduling Algorithm

Effects of Hard Real-Time Constraints in Implementing the Myopic Scheduling Algorithm Effects of Hard Real-Time Constraints in Implementing the Myopic Scheduling Algorithm Abstract- Institute of Information Technology, University of Dhaka, Dhaka 1 muheymin@yahoo.com, K M. Sakib, M S. Hasan

More information

Applications of Thread Prioritization in SMT Processors

Applications of Thread Prioritization in SMT Processors Applications of Thread Prioritization in SMT Processors Steven E. Raasch & Steven K. Reinhardt Electrical Engineering and Computer Science Department The University of Michigan 1301 Beal Avenue Ann Arbor,

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

06-2 EE Lecture Transparency. Formatted 14:50, 4 December 1998 from lsli

06-2 EE Lecture Transparency. Formatted 14:50, 4 December 1998 from lsli 06-1 Vector Processors, Etc. 06-1 Some material from Appendix B of Hennessy and Patterson. Outline Memory Latency Hiding v. Reduction Program Characteristics Vector Processors Data Prefetch Processor /DRAM

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Embedded Systems: OS. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Embedded Systems: OS. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Embedded Systems: OS Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Standalone Applications Often no OS involved One large loop Microcontroller-based

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

picojava I Java Processor Core DATA SHEET DESCRIPTION

picojava I Java Processor Core DATA SHEET DESCRIPTION picojava I DATA SHEET DESCRIPTION picojava I is a uniquely designed processor core which natively executes Java bytecodes as defined by the Java Virtual Machine (JVM). Most processors require the JVM to

More information

2. List the five interrupt pins available in INTR, TRAP, RST 7.5, RST 6.5, RST 5.5.

2. List the five interrupt pins available in INTR, TRAP, RST 7.5, RST 6.5, RST 5.5. DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EE6502- MICROPROCESSORS AND MICROCONTROLLERS UNIT I: 8085 PROCESSOR PART A 1. What is the need for ALE signal in

More information

A hardware operating system kernel for multi-processor systems

A hardware operating system kernel for multi-processor systems A hardware operating system kernel for multi-processor systems Sanggyu Park a), Do-sun Hong, and Soo-Ik Chae School of EECS, Seoul National University, Building 104 1, Seoul National University, Gwanakgu,

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

UNIT II SYSTEM BUS STRUCTURE 1. Differentiate between minimum and maximum mode 2. Give any four pin definitions for the minimum mode. 3. What are the pins that are used to indicate the type of transfer

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Simultaneous Multithreading Architecture

Simultaneous Multithreading Architecture Simultaneous Multithreading Architecture Virendra Singh Indian Institute of Science Bangalore Lecture-32 SE-273: Processor Design For most apps, most execution units lie idle For an 8-way superscalar.

More information

IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE

IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE IMPLEMENTING HARDWARE MULTITHREADING IN A VLIW ARCHITECTURE Stephan Suijkerbuijk and Ben H.H. Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Computer System Architecture Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer

Computer System Architecture Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer Computer System Architecture 6.823 Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer Name: This is a closed book, closed notes exam. 80 Minutes 15 Pages Notes: Not all questions are of equal difficulty,

More information

Simulation of Priority Driven Algorithms to Schedule Real-Time Systems T.S.M.Priyanka a*, S.M.K.Chaitanya b

Simulation of Priority Driven Algorithms to Schedule Real-Time Systems T.S.M.Priyanka a*, S.M.K.Chaitanya b International Journal of Current Science, Engineering & Technology Original Research Article Open Access Simulation of Priority Driven Algorithms to Schedule Real-Time Systems T.S.M.Priyanka a*, S.M.K.Chaitanya

More information

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices,

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices, Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices, CISC and RISC processors etc. Knows the architecture and

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

A Modified Maximum Urgency First Scheduling Algorithm for Real-Time Tasks

A Modified Maximum Urgency First Scheduling Algorithm for Real-Time Tasks Vol:, o:9, 2007 A Modified Maximum Urgency irst Scheduling Algorithm for Real-Time Tasks Vahid Salmani, Saman Taghavi Zargar, and Mahmoud aghibzadeh International Science Index, Computer and Information

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Real-Time Scheduling of Sensor-Based Control Systems

Real-Time Scheduling of Sensor-Based Control Systems In Proceedings of Eighth IEEE Workshop on Real-Time Operatings Systems and Software, in conjunction with 7th IFAC/IFIP Workshop on Real-Time Programming, Atlanta, GA, pp. 44-50, May 99. Real-Time Scheduling

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

1. INTRODUCTION TO MICROPROCESSOR AND MICROCOMPUTER ARCHITECTURE:

1. INTRODUCTION TO MICROPROCESSOR AND MICROCOMPUTER ARCHITECTURE: 1. INTRODUCTION TO MICROPROCESSOR AND MICROCOMPUTER ARCHITECTURE: A microprocessor is a programmable electronics chip that has computing and decision making capabilities similar to central processing unit

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

Analyzing Real-Time Systems

Analyzing Real-Time Systems Analyzing Real-Time Systems Reference: Burns and Wellings, Real-Time Systems and Programming Languages 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich Real-Time Systems Definition Any system

More information

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085.

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085. (1) Draw and explain the internal architecture of 8085. The architecture of 8085 Microprocessor is shown in figure given below. The internal architecture of 8085 includes following section ALU-Arithmetic

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems Job Posting (Aug. 19) ECE 425 Microprocessor Systems TECHNICAL SKILLS: Use software development tools for microcontrollers. Must have experience with verification test languages such as Vera, Specman,

More information

Boosting SMT Performance by Speculation Control

Boosting SMT Performance by Speculation Control Boosting SMT Performance by Speculation Control Kun Luo Manoj Franklin ECE Department University of Maryland College Park, MD 7, USA fkunluo, manojg@eng.umd.edu Shubhendu S. Mukherjee 33 South St, SHR3-/R

More information

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup

More information

LATENCY-AWARE WRITE BUFFER RESOURCE CONTROL IN MULTITHREADED CORES

LATENCY-AWARE WRITE BUFFER RESOURCE CONTROL IN MULTITHREADED CORES LATENCY-AWARE WRITE BUFFER RESOURCE CONTROL IN MULTITHREADED CORES Shane Carroll and Wei-Ming Lin Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio,

More information

An application-based EDF scheduler for OSEK/VDX

An application-based EDF scheduler for OSEK/VDX An application-based EDF scheduler for OSEK/VDX Claas Diederichs INCHRON GmbH 14482 Potsdam, Germany claas.diederichs@inchron.de Ulrich Margull 1 mal 1 Software GmbH 90762 Fürth, Germany margull@1mal1.com

More information

Chapter 8. Pipelining

Chapter 8. Pipelining Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others

Multithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Question Bank Microprocessor and Microcontroller

Question Bank Microprocessor and Microcontroller QUESTION BANK - 2 PART A 1. What is cycle stealing? (K1-CO3) During any given bus cycle, one of the system components connected to the system bus is given control of the bus. This component is said to

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley. Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

VLSI Design of Multichannel AMBA AHB

VLSI Design of Multichannel AMBA AHB RESEARCH ARTICLE OPEN ACCESS VLSI Design of Multichannel AMBA AHB Shraddha Divekar,Archana Tiwari M-Tech, Department Of Electronics, Assistant professor, Department Of Electronics RKNEC Nagpur,RKNEC Nagpur

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) #1 Lec # 2 Fall 2003 9-10-2003 Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing

More information

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review

CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, Review CSc33200: Operating Systems, CS-CCNY, Fall 2003 Jinzhong Niu December 10, 2003 Review 1 Overview 1.1 The definition, objectives and evolution of operating system An operating system exploits and manages

More information

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores

Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Performance and Power Impact of Issuewidth in Chip-Multiprocessor Cores Magnus Ekman Per Stenstrom Department of Computer Engineering, Department of Computer Engineering, Outline Problem statement Assumptions

More information

A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems

A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems 10th International Conference on Information Technology A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems Vahid Salmani Engineering, Ferdowsi University of salmani@um.ac.ir

More information

Chapter 19: Real-Time Systems. Operating System Concepts 8 th Edition,

Chapter 19: Real-Time Systems. Operating System Concepts 8 th Edition, Chapter 19: Real-Time Systems, Silberschatz, Galvin and Gagne 2009 Chapter 19: Real-Time Systems System Characteristics Features of Real-Time Systems Implementing Real-Time Operating Systems Real-Time

More information

Simultaneous Multithreading: A Platform for Next-generation Processors

Simultaneous Multithreading: A Platform for Next-generation Processors M A R C H 1 9 9 7 WRL Technical Note TN-52 Simultaneous Multithreading: A Platform for Next-generation Processors Susan J. Eggers Joel Emer Henry M. Levy Jack L. Lo Rebecca Stamm Dean M. Tullsen Digital

More information