COFFEE A Core for Free

Size: px
Start display at page:

Download "COFFEE A Core for Free"

Transcription

1 COFFEE A Core for Free Juha Kylliäinen, Jari Nurmi and Mika Kuulusa Tampere University of Technology, Finland juha.p.kylliainen@tut.fi Abstract This paper presents design and implementation of an open source processor core developed at Tampere University of Technology, Finland. The design guidelines of a RISC core are introduced and some of the typical design tradeoffs are presented. The architecture of the developed processor engine, COFFEE 1 RISC core, is explained. 1. Introduction The complexity of a processor design can vary from multi-million gate design to a design with few tens of kilo gates. Also, the power consumption can vary from milliwatt range to tens of watts. In order to set the scope for comparison, we need to classify processors and define what we mean by a processor core in this context. Processors targeted to personal computers and mainframe computers form a class with similar requirements. Computing performance and hardware support for operating systems are key requirements in that class whereas power consumption is not an issue. Processors targeted to embedded systems belong to another class. Embedded systems are products other than general purpose computing machines. Processors in such systems are used to implement certain functionality of a product, the capabilities of the processor are not important to the user as long as the product fulfils certain requirements. Some level of performance is needed and usually a processor which has just enough performance, but no more, is selected. Especially in battery operated mobile devices, it is essential to select a processor which is not too good in order to avoid excessive power consumption. Independent of the target application domain, every processor has an execution core. Here we define the core of a processor as follows. The core is the unit responsible for interpretation and execution of the instruction set of the processor in question. If we rip off all peripheral components, buses and cache memories, only the core is left. 1 COFFEE RISC Core is a trademark of Tampere University of Technology, Tampere, Finland In many contexts, processor cores contain also cache memories but we prefer to leave them out. This is because memory architecture affects drastically to performance as well as power consumption and chip area. Also, there is not one optimal solution for memory hierarchy, but the design of the memory architecture is guided by the application. We can roughly divide instruction sets to RISC (Reduced Instruction Set Computer), CISC (Complex Instruction Set Computer) and DSP (Digital Signal Processing) like instruction sets. DSP processing cores belong to application specific group. RISC and CISC cores are suitable for general purpose processing. RISC and CISC instruction sets differ mainly because of different design approach. We can argue that a CISC like instruction set is designed for human and a RISC like instruction set is designed for a compiler. In the early stages of minicomputers, programming was carried out using symbolic machine code, assembly code. It was advantageous to have instructions understandable by humans and instructions which performed more. Nowadays the compilers produce RISC-like instructions even for CISC processors. Instructions which are not used by a compiler are quite useless, and only make the hardware more complex. This in turn reduces performance and increases chip area and power consumption. This is one of the reasons why current trend is towards RISC type of processors. In our COFFEE core we have adopted the RISC philosophy to derive a good processing engine for embedded computing. 2. RISC Design Philosophy We can point out few rules which are followed by RISC designers. The abbreviation RISC focuses on reducing the number and complexity of instructions but modern RISCs may have quite complex instructions included in their instruction set. One instruction per cycle. This requirement does not make sense unless we refer to a pipelined design. If multiple parallel execution pipelines are used, more than one instruction can complete each cycle. The execution time of a program depends on throughput of the pipeline, not the latency of individual instructions. Increasing the number of pipeline stages reduces clock cycle time but at the same time the number of stall and flush cycles

2 (wasted cycles) is increased. The number of wasted cycles can be reduced by careful scheduling of instructions but cannot be fully eliminated. This is a consequence of the fact that software is sequential in nature and operations tend to have strong dependency on results of previously executed operations. Also the penalty of branching becomes significant with very deep pipelining. During the evaluation of branch condition and branch target address, several instructions may enter the pipeline. If the branch is taken, those instructions have to be flushed. There are several ways to alleviate this problem. Delayed branching is quite efficient because a good compiler almost always finds instructions to be placed in delay slot(s) after the branch instruction. Delayed branching together with efficient hardware for address calculation and condition evaluation are enough in designs with less than ten pipeline stages. With deep pipelines, pre-fetching and speculative execution are most often used. The latter requires more than one execution pipeline. Fixed Instruction Length. This requirement aims at simplicity of decoding an instruction. Also the previous requirement of issuing one instruction per cycle cannot be achieved if multiple memory accesses are needed in fetch stage of the pipeline. RISC instruction word usually contains all the information needed to execute the instruction. The width of the instruction is most often 32 or 64 bits. Only load and store instructions access memory. This requirement aims at utilizing the pipeline in an optimal way as well as minimizing memory traffic. Modern RISC processors usually exploit pre-fetch mechanism: The address of the needed data is passed to cache memory well before that data is actually needed. A compiler can schedule pre-fetch commands in order to minimize cache misses which cause stalls. RISC processors usually have many general purpose registers (from 16 to 64 typically), which make it possible to handle most of the processing inside the core and use load and store instructions only to move ready results to memory and new data in. This is a simplified view though, the actual amount of memory traffic depends heavily on the application (and compiler). Simplified addressing modes. Compilers hardly ever use complex address arithmetic supported by CISC hardware. Complex address calculations in hardware only extend the clock cycle, so why should we use them. If very complex address arithmetic is needed, it can be synthesized using a few simple RISC instructions. Fewer, simpler instructions. Simpler operations imply shorter clock cycles. Simple instructions also fit better to RISC-like pipelines. Many simple RISC designs perform most of their operations in one execution stage, that is, once the data has been fetched from register file, it takes one clock cycle to evaluate the result. Demanding instructions, such as multiplication, use several cycles in execution stage and in effect stall the rest of the pipeline. This approach is adopted by for example ARM (Advanced Risc Machines) [3]. In COFFEE RISC core, multiplication is also pipelined in order to increase the throughput. 3. Defining COFFEE RISC ISA One cannot prove that a certain set of instructions is better than another. We can easily measure or compute the number of instructions executed per time unit but we can only compare cores which execute the same instruction set. Sometimes it is not even possible to do this because deeply pipelined architectures usually expect the compiler to schedule instructions in an optimal way. Measurements depend on the compiler and application. To alleviate the pain of performance measurements, several benchmarks have been developed. These benchmarks are usually a set of programs which are executed on the target processor and execution time is measured. Even though it might be justified to compare benchmark results between different processors, it is clear, that they are a measure of a system composed of compiler, processor core and the memory architecture. The instruction set of COFFEE RISC was designed based on instruction sets of RISC processors currently on market. Instructions which were available in most of processors were included and rare ones were excluded. Instructions which enable coprocessor support were also added as a way to extend the instruction set if needed. This might not seem very analytical approach, but was a good starting point for development. The penalty of implementing a particular instruction is not known before modeling the execution of that instruction with hardware timing. This makes it extremely difficult to decide whether an instruction should be included or not prior to the implementation phase of the design. Some instructions present in some RISC architectures were easy to exclude, for example division. Division is not a deterministic process, that is, execution time cannot be predicted. This implies iterative execution which means in practice stalling the pipeline until the result is ready. If needed, division should be done in software and is best avoided in time critical algorithms. The basic implementation of COFFEE RISC has 66 instructions. A special instruction was included for future extensions: swm (switch mode). It is used to switch to a different instruction decoding hardware. It can be used to implement application specific instruction sets and develop 'better' instruction sets in the future without giving away compatibility with old software. The execution pipeline of the COFFEE RISC (explained in section 5) together with swm -instruction make it possible to integrate for example MAC (Multiple and Accumulate)

3 -instruction in the pipeline without deteriorating performance. Currently swm -instruction is used to switch to compressed instruction mode where each half of the 32 bit instruction word is interpreted as individual instruction. The data processing instructions operate on two register operands, or alternatively, one register operand and one immediate operand. Instructions which produce data can write their result to any general purpose register. Three register indexes can be specified in one instruction word. There are fourteen arithmetic instructions, ten bit field manipulation instructions (bytes, halfwords, arbitrary bit fields), six boolean operations, eight conditional branches, four other jumps (linking jumps, absolute jumps etc.) and six shift instructions. Most of the instructions can be executed conditionally making it more efficient to implement short conditional statements of a high level language (not having to jump over code if condition is false). Conditional branching is implemented using two instructions: compare and branch. Compare instructions of COFFEE RISC produce condition flags which can be saved to one of eight possible condition flag registers for later use. Branch instructions evaluate branch condition based on those flags. A delay slot of one instruction is present after any jump or branch. makes it possible to interface large and slow main memories directly. The number of cycles per access can be configured by software separately for both interfaces. In simple systems with only one system bus and no cache memory, sharing data bus might be considered. This is supported directly by COFFEE core. COFFEE RISC supports connecting up to four coprocessors. Coprocessor interface is much like a memory interface. Coprocessor addressing is limited to 7 bits, including a field of two bits for coprocessor ID (identification) and a 5-bit field for coprocessor register index. COFFEE has dedicated instructions to move data and instructions to/from coprocessor. In addition, a coprocessor can interrupt COFFEE core by asserting an exception signal included in the interface. An important feature of the coprocessor interface is its ability to connect to different clock domain. This is achieved by synchronizing also exception signals on core side and allowing data transfer time to be up to sixteen clock cycles long. As with memories, also with coprocessors the access time can be configured by software. Synchronizing circuitry on coprocessor side is needed unless the clock frequency of the COFFEE core is an even multiple of the coprocessor clock frequency. COPROCESSOR_0 COPROCESSOR_1 COPROCESSOR_2 COPROCESSOR_3 4. Overview of COFFEE RISC Features The COFFEE RISC is a so called load-store machine: Memory operands have to be loaded to registers before performing any operation on them. Similarly, a result of an operation is written to a register from where it can be written to memory using special data transfer instruction. As in most of RISC architectures, a vast amount of registers is provided to reduce excessive memory traffic. A register bank with two register sets is provided. Each register set contains 32 register. Both sets are available in privileged mode of operation, but only one set is accessible in user mode. Different operating modes are provided in order to support operating systems. A memory mapped register bank, CCB (Core Configuration Block) is provided to further support operating systems and different configurations. It contains for example registers defining protected memory areas. CCB can be remapped anywhere in the address space. COFFEE RISC is a 32-bit architecture, that is, data is manipulated in 32-bit words. Memory interface is of Harvard type, having separate interfaces for data and instruction memory. Figure 1 shows an example of interfacing COFFEE. Memory interfaces of COFFEE core do not restrict memories to be of any type as long as they conform to interface timing. Multi-cycle access is supported which INST_CACHE INT_HANDLER cop_exc : (3:0) i_addr : (31:0) i_word : (31:0) i_cache_miss ext_handler ext_interrupt : (7:0) offset : (7:0) int_done int_ack core_clock cop_exc : (3:0) i_addr : (31:0) i_word : (31:0) i_cache_miss ext_handler ext_interrupt : (7:0) offset : (7:0) int_done int_ack clk COFFEE core cop_port : (40:0) rd wr d_cache_miss data : (31:0) d_addr : (31:0) pcb_rd pcb_wr stall reset_x_out rst_x boot_sel bus_ack bus_req cop_port : (40:0) rd wr d_cache_miss data : (31:0) d_addr : (31:0) stal l pcb_rd pcb_wr d_addr(7:0) data reset_x_out rst_x boot_sel bus_ack bus_req Figure 1, Interfacing COFFEE. data DATA_CACHE PCB BOOT_CNTRL BUS_CONTROL COFFEE core provides an internal interrupt controller which is adequate for many designs but a possibility to extend is provided. Connecting up to eight external interrupt sources is supported. If coprocessors are not connected, four inputs reserved for coprocessor exception signalling can be used as interrupt request lines, giving possibility to connect twelve sources. Interrupt controller has synchronization circuitry allowing asynchronous signals to be connected. If an external controller is used,

4 synchronization is bypassed in order to reduce signalling latency. Priorities between interrupt sources can be set by software via CCB registers. Interrupt sources can be masked individually and disabled or enabled all at once using di and ei instructions. All interrupts are vectored. Interrupt vectors reside in CCB. The entry address of an interrupt service routine can be the corresponding vector directly or a combination of the vector and an offset given externally if an external controller is used. A block called PCB (Peripheral Control Block), also seen in figure 1, requires some explanation. Interface to this block is provided to make it easy to communicate directly with peripheral devices around COFFEE core. Memory space reserved for peripherals can be set by software. All accesses to that space will assert PCB_WR and PCB_RD signals directing the access to PCB, instead of WR and RD signals, that are used to access the data memory. Control and data registers of peripherals can be placed into one register bank having a single decoding logic or they can reside inside each peripheral device just sharing the bus. Note that the data part of the interface is shared with data memory. Signals BOOT_SEL and STALL which can be seen in Figure 1 have a somewhat important meaning. BOOT_SEL can be used to select the address of the first executed instruction: If BOOT_SEL is high, COFFEE core will read its boot address from data bus. The address should be driven on the bus simultaneously with reset signal. STALL signal is provided to enable stalling the COFFEE core for whatever reason. In battery powered systems STALL signal can be used to save power when there is nothing to be processed. Software execution resumes instantly after releasing STALL signal because the clock of the core is not disabled, but only data in all registers is frozen. 5. Pipeline Structure COFFEE RISC core has a single pipeline with six pipeline stages. Figure 2 illustrates the different stages of COFFEE RISC pipeline. Each block in the figure presents a data transformation or some other operation done during one clock cycle. At the end of each stage, intermediate or finals results are clocked to the input registers of the following stage. Execution proceeds from left to right. As can be seen from the figure it takes six clock cycles for an instruction to go through the pipeline. The datapath is fully pipelined which means that a new instruction enters the first stage of the pipeline and one instruction completes at the last stage of the pipeline every clock cycle. This gives a throughput of one IPC (Instructions Per Cycle) in ideal conditions without any pipeline stalls. The design uses only one clock. In the following, each stage shown in figure 2 is described briefly. Figure 2, COFFEE pipeline. In the first pipeline stage, marked as FETCH in the figure, three operations are performed. A new 32-bit instruction is fetched from the location pointed by the program counter, PC. In 16-bit mode, if the address is even, a 32-bit double instruction is fetched. The address in PC is checked and an exception raised in case of a violation. Finally program counter is incremented by two or four depending on mode. The second pipeline stage, marked as DECODE in the figure, is the most important from the control point of view. This is the point where an instruction is identified and most of the decisions about its behavior in the next stages are made. If COFFEE core is in 16-bit decoding mode, 16-bit halfword is extended to an equivalent 32-bit instruction before passing it to the decode logic. The execution condition, defined by special fields inside instruction word, is evaluated. Evaluation involves checking pre-evaluated condition flags against the specified condition. If execution condition is false, the instruction will simply be flushed on next rising edge of the clock. In parallel with the execution condition check, signals needed during the current and following stages are decoded from the instruction word. Based on signals evaluated in DECODE stage and signals decoded from previous instructions currently on pipeline, the control checks for data dependencies. COFFEE RISC resolves all data dependencies by forwarding the needed data as soon as it becomes available. If data cannot be forwarded, FETCH and DECODE stages are stalled until data is available. Hardware support for resolving dependencies makes programming as well as compiler construction easier. In this simple six-stage pipeline, forwarding logic has a delay of approximately one third of the clock cycle, so it does not reduce clock frequency, but only improves performance by avoiding unnecessary stalls. As can be

5 seen from figure 2, data can be forwarded from several points. Other operations in DECODE stage are extending immediate operand, calculating PC relative jump address and evaluating new status flags if needed. All jump instructions and conditional branches (PC relative and absolute) are executed in DECODE stage, that is, at the end of the stage the target address is clocked into the PC register. Conditional branching is based on pre-evaluated condition flags as conditional execution is. To prepare for the next stage, register operands, whether forwarded or fetched from register file, are clocked to input registers of EXE1 stage. EXE1 is the first stage where data is manipulated. Integer addition, shifting, boolean and bit-field manipulating instructions are finished during this stage. All multiplication operations start in this stage producing intermediate results to next stage. Address for data memory access is calculated using the adder of ALU. At the end of the cycle, condition flags (Z = zero, N = negative, C = carry) are evaluated by compare instructions and some of the arithmetic instructions. Execution of instructions requiring more than one cycle continue in stage EXE2. During this stage, 16-bit multiplication, producing a 32-bit result, is finished. Condition flags evaluated in the previous stage are written to selected condition register. Note that the condition flags are available for DECODE stage before they are written to condition register bank. This is achieved by forwarding data inside condition register bank from input to output if the target register is the same as source register. The data memory address calculated in the previous stage is checked in EXE2. Address is compared against memory limits set for user. Also, it is checked whether the address points to the configuration block, CCB, in which case memory access is not performed. Also, address calculation overflow is detected in this stage. All coprocessor accesses are performed in stage EXE2. If a coprocessor access takes multiple cycles, pipeline will be stalled during wait cycles. This implies that if a slow coprocessor is used, performance will deteriorate unless a special interface block is used. The stage marked EXE3 in figure 2 completes execution of 32-bit multiplication instructions. Also ld and st instructions complete their work during this stage by accessing data memory. If multi-cycle access is used, the rest of the pipeline is stalled during wait cycles, since the instructions coming behind cannot bypass EXE3 stage. This points out the importance of fast data memory or data cache and prefetch capability. The last stage WB, write back, completes the execution of all instructions which produce data. Data is written to the selected destination register during this stage. The register file has internal forwarding which makes data in this stage visible to DECODE stage. 6. Implementation of COFFEE RISC Core COFFEE core is a RTL (Register Transfer Level) VHDL description, that is, a soft core. It can be ported to any technology with basic library components. The VHDL description is written in a way minimizing variation between different technology libraries. Arithmetic operations are coded at boolean level which produces predictable results since a synthesis tool does not try to map operations to fixed hard implementations. The pipeline is balanced based on relative measures of the depth of the logic in each stage. This should ensure equal results between different synthesis tools. Also mapping directly to technology without optimization should produce acceptable results. COFFEE RISC core was designed to be a general purpose processing element suitable for most applications in either SoC (System on Chip) environment or in more conventional embedded systems. One might think that a general purpose machine is too much of a compromise, that is, no good for anything. While this might be true in some cases, COFFEE RISC makes an exception. COFFEE RISC was designed to be a platform which can be tailored to suit the application. In practice this means that COFFEE is not a fixed design and moreover, COFFEE is many designs. The basic version of COFFEE provides adequate resources and processing power for many applications but it can be enhanced in various ways. Designer of a system can choose the combination of modules to get the best trade-offs. Usually this means getting just enough performance while minimizing power consumption and silicon area. If none of the ready made modules result in a satisfactory design, custom modifications can be made. Customizing COFFEE core is straightforward since it was designed to be easily modifiable. In addition to tailoring the core, external modules can be connected to construct a suitable platform for an application. COFFEE core provides simple interfaces for expansion and communication. COFFEE core was designed according to the guidelines for producing reusable IP (Intellectual Property) components [7]. A good IP is more than a good design, there are several things to consider. The importance of documentation cannot be stressed enough. An IP block without proper documentation is totally useless no matter how flexible or configurable it might be. Reusability and configurability were the main postulates for COFFEE core design. Any design which does not constraint implementation technology, has comprehensive documentation and is moderately easy to modify is reusable. If we add scalability and extendibility to the list, we have an IP block. In fact, our core is more

6 than IP. Since it is published as an open source component, it can be referred to as Intellectual Commons (IC) which enables innovation to be incrementally built on top of what we provide. It is the Linux of computation hardware. Modularity gives user the freedom to select the optimum modules or blocks for the design from a set of ready made blocks. In addition, modular structure with well documented component interfaces allows custom blocks to be used. Module-wise synthesis allows each module to be optimized either for speed or area resulting in overall optimal design. Because of its relatively simple interface, COFFEE is easy to instantiate anywhere. It was designed to be able to work as a stand-alone unit without any additional circuitry. It can however easily be equipped with cache memories and unlimited amount of peripheral devices. Peripherals can be connected via direct register interface or AMBA bus. Memory interfaces make no assumptions about the type of memories. The user can map the address space freely because there are no fixed addresses for peripherals or configuration registers. Even the boot address can be defined externally. A series of VCI [6] interface wrappers are provided which allow easy connectivity to other VCI components. The basic COFFEE core is the starting point for developing suitable platforms for applications [4]. It provides the common resources needed by every embedded system: built-in interrupt controller which supports up to twelve sources, simple memory protection mechanism and two timers. System designer selects memories and I/O peripherals as needed by application. Up to four coprocessors can be connected to boost for example floating point operations [5] or DSP processing. Preliminary synthesis results imply a clock frequency of over 200MHz (0.18u CMOS) for COFFEE RISC version 1.0. Software development tools for COFFEE are currently developed at Tampere University of Technology. GNU compiler collection [8] has been ported to COFFEE RISC and is currently being tested. In house assembler, linker and instruction set simulator have also been developed. the implementation is done using hardware description languages. The RTL VHDL description of COFFEE core enables to do this. Development and research work for more automated processor generators is currently going on. The problem is that if we want to achieve a short time to market, we also have to be able to generate software tools for a new architecture quickly. 8. References [1] David A. Patterson and John L. Hennessy, Computer Organization & Design, Morgan Kaufman Publishers Inc, San Francisco, [2] Vincent P. Heuring and Harry F. Jordan, Computer Systems Design and Architecture, Addison - Wesley, California, [3] Steve Furber, ARM system-on-chip-architecture, second edition, Addison - Wesley, [4] Tapani Ahonen et al, A Brunch from the COFFEE Table - Case Study in NOC Patform Design, in Jari Nurmi, Hannu Tenhunen, Jouni Isoaho, and Axel Jantsch (eds.): Interconnect-Centric Design for Advanced SoC and NoC, Kluwer Academic Publishers, [5] Claudio Brunelli, Design of a Floating-Point Unit for a RISC Microprocessor, MSc Thesis, Tampere University of Technology, [6] VSI Alliance, Virtual Component Interface Standard Version 2 (OCB 2 2.0), On-Chip Bus Development Working Group, April [7] Michael Keating, Reuse Methodology Manual: for system-on-a-chip designs, Kluwer Academic, [8] 7. Conclusion It is quite straightforward to design and implement a general purpose processing core by following RISC design guidelines. Here, we have presented the open source COFFEE RISC core which can be used in SoC design or in conventional embedded systems. The core forms a good starting point to develop applicationspecific platforms. If the performance is not at premium the circuit implementation work could be even automated to certain extent. Much of the work can be left to synthesis tool if

COFFEE Core USER MANUAL

COFFEE Core USER MANUAL COFFEE Core USER MANUAL July 2007 Contents 1. Interface specification of the COFFEE RISC Core 1.1. Shared Data Bus 1.2. Interfacing coprocessors 2. Registers 2.1. General 2.2. Set 1: General Purpose Registers

More information

ARM ARCHITECTURE. Contents at a glance:

ARM ARCHITECTURE. Contents at a glance: UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture

More information

Processor Design. Introduction, part I

Processor Design. Introduction, part I Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor

More information

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm

More information

Interrupts. Table 1, Interrupt priorities if external handler is used, 0 - highest. coprocessor number 2 exception/interrupt

Interrupts. Table 1, Interrupt priorities if external handler is used, 0 - highest. coprocessor number 2 exception/interrupt Interrupts General COFFEE core currently supports connecting eight external interrupt sources directly. If coprocessors are not connected the four inputs reserved for coprocessor exception signalling can

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

CISC RISC. Compiler. Compiler. Processor. Processor

CISC RISC. Compiler. Compiler. Processor. Processor Q1. Explain briefly the RISC design philosophy. Answer: RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Digital IP Cell 8-bit Microcontroller PE80

Digital IP Cell 8-bit Microcontroller PE80 1. Description The is a Z80 compliant processor soft-macro - IP block that can be implemented in digital or mixed signal ASIC designs. The Z80 and its derivatives and clones make up one of the most commonly

More information

Processors. Young W. Lim. May 9, 2016

Processors. Young W. Lim. May 9, 2016 Processors Young W. Lim May 9, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices,

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices, Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices, CISC and RISC processors etc. Knows the architecture and

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

DC57 COMPUTER ORGANIZATION JUNE 2013

DC57 COMPUTER ORGANIZATION JUNE 2013 Q2 (a) How do various factors like Hardware design, Instruction set, Compiler related to the performance of a computer? The most important measure of a computer is how quickly it can execute programs.

More information

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems Job Posting (Aug. 19) ECE 425 Microprocessor Systems TECHNICAL SKILLS: Use software development tools for microcontrollers. Must have experience with verification test languages such as Vera, Specman,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ARM processor organization

ARM processor organization ARM processor organization P. Bakowski bako@ieee.org ARM register bank The register bank,, which stores the processor state. r00 r01 r14 r15 P. Bakowski 2 ARM register bank It has two read ports and one

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Chapter 2 Lecture 1 Computer Systems Organization

Chapter 2 Lecture 1 Computer Systems Organization Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Module 5 - CPU Design

Module 5 - CPU Design Module 5 - CPU Design Lecture 1 - Introduction to CPU The operation or task that must perform by CPU is: Fetch Instruction: The CPU reads an instruction from memory. Interpret Instruction: The instruction

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4 15CS44: MICROPROCESSORS AND MICROCONTROLLERS QUESTION BANK with SOLUTIONS MODULE-4 1) Differentiate CISC and RISC architectures. 2) Explain the important design rules of RISC philosophy. The RISC philosophy

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Educational Simulation of the RiSC Processor

Educational Simulation of the RiSC Processor Educational Simulation of the RiSC Processor Marc Jaumain BEAMS department, Bio Electro and Mechanical Systems, Université Libre de Bruxelles, Belgium mjaumain@ulb.ac.be Michel Osée 1, Aliénor Richard

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Abstract The proposed work is the design of a 32 bit RISC (Reduced Instruction Set Computer) processor. The design

More information

CPU Structure and Function

CPU Structure and Function Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com http://www.yildiz.edu.tr/~naydin CPU Structure and Function 1 2 CPU Structure Registers

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

CN310 Microprocessor Systems Design

CN310 Microprocessor Systems Design CN310 Microprocessor Systems Design Micro Architecture Nawin Somyat Department of Electrical and Computer Engineering Thammasat University 28 August 2018 Outline Course Contents 1 Introduction 2 Simple

More information

Memory Models. Registers

Memory Models. Registers Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Layered View of the Computer http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Assembly/Machine Programmer View

More information

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK SUB CODE / SUBJECT: CS1202/COMPUTER ARCHITECHTURE YEAR / SEM: II / III UNIT I BASIC STRUCTURE OF COMPUTER 1. What is meant by the stored program

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

CSE A215 Assembly Language Programming for Engineers

CSE A215 Assembly Language Programming for Engineers CSE A215 Assembly Language Programming for Engineers Lecture 4 & 5 Logic Design Review (Chapter 3 And Appendices C&D in COD CDROM) September 20, 2012 Sam Siewert ALU Quick Review Conceptual ALU Operation

More information

Chapter 1 Microprocessor architecture ECE 3120 Dr. Mohamed Mahmoud http://iweb.tntech.edu/mmahmoud/ mmahmoud@tntech.edu Outline 1.1 Computer hardware organization 1.1.1 Number System 1.1.2 Computer hardware

More information

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 (Spl.) Sep 2012 42-47 TJPRC Pvt. Ltd., VLSI DESIGN OF

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Complexity-effective Enhancements to a RISC CPU Architecture

Complexity-effective Enhancements to a RISC CPU Architecture Complexity-effective Enhancements to a RISC CPU Architecture Jeff Scott, John Arends, Bill Moyer Embedded Platform Systems, Motorola, Inc. 7700 West Parmer Lane, Building C, MD PL31, Austin, TX 78729 {Jeff.Scott,John.Arends,Bill.Moyer}@motorola.com

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

Design document: Handling exceptions and interrupts

Design document: Handling exceptions and interrupts Design document: Handling exceptions and interrupts his document describes what happens on pipeline in case of an exception or interrupt or a combination of these. Definitions an interrupt An external/internal

More information

UNIT 2 PROCESSORS ORGANIZATION CONT.

UNIT 2 PROCESSORS ORGANIZATION CONT. UNIT 2 PROCESSORS ORGANIZATION CONT. Types of Operand Addresses Numbers Integer/floating point Characters ASCII etc. Logical Data Bits or flags x86 Data Types Operands in 8 bit -Byte 16 bit- word 32 bit-

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function Rev. 3.2.1 (2005-06) by Enrico Nardelli 11-1 CPU Functions CPU must: Fetch instructions Decode instructions

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Chapter 4. MARIE: An Introduction to a Simple Computer

Chapter 4. MARIE: An Introduction to a Simple Computer Chapter 4 MARIE: An Introduction to a Simple Computer Chapter 4 Objectives Learn the components common to every modern computer system. Be able to explain how each component contributes to program execution.

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU Structure and Function Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU must: CPU Function Fetch instructions Interpret/decode instructions Fetch data Process data

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Concurrent/Parallel Processing

Concurrent/Parallel Processing Concurrent/Parallel Processing David May: April 9, 2014 Introduction The idea of using a collection of interconnected processing devices is not new. Before the emergence of the modern stored program computer,

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some

More information

Introduction to Microcontrollers

Introduction to Microcontrollers Introduction to Microcontrollers Embedded Controller Simply an embedded controller is a controller that is embedded in a greater system. One can define an embedded controller as a controller (or computer)

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Blog -

Blog - . Instruction Codes Every different processor type has its own design (different registers, buses, microoperations, machine instructions, etc) Modern processor is a very complex device It contains Many

More information

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Where Does The Cpu Store The Address Of The

Where Does The Cpu Store The Address Of The Where Does The Cpu Store The Address Of The Next Instruction To Be Fetched The three most important buses are the address, the data, and the control buses. The CPU always knows where to find the next instruction

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

Embedded Systems Ch 15 ARM Organization and Implementation

Embedded Systems Ch 15 ARM Organization and Implementation Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron

More information

Digital System Design Using Verilog. - Processing Unit Design

Digital System Design Using Verilog. - Processing Unit Design Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Functional Units of a Modern Computer

Functional Units of a Modern Computer Functional Units of a Modern Computer We begin this lecture by repeating a figure from a previous lecture. Logically speaking a computer has four components. Connecting the Components Early schemes for

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL Sharmin Abdullah, Nusrat Sharmin, Nafisha Alam Department of Electrical & Electronic Engineering Ahsanullah University of Science & Technology

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information