EECS 151/251 FPGA Project Report

Size: px
Start display at page:

Download "EECS 151/251 FPGA Project Report"

Transcription

1 EECS 151/251 FPGA Project Report GSI: Vighnesh Iyer Team Number: 6 Partners: Ashwinlal Sreelal and Zhixin Alice Ye Due Date: Dec 9, 2016

2 Part I: Project Description The aims of this project were to develop a 3-stage pipelined Central Processing Unit (CPU) to run on a Virtex 6 Xilinx FPGA. The CPU runs a reduced instruction set architecture that includes most of the instructions in the RV32I Base Instruction Set. The CPU had a number of requirements, including minimizing the Cycles Per Instruction (CPI) and maximizing the speed of the CPU clock (to beyond 50 MHz), while staying within the size limitations of the FPGA. In order to achieve this, data forwarding was implemented to resolve different types of control, structural, and data hazards. Optimizations were also conducted to reduce the core CPU critical path, including a global branch predictor. In addition to the core processor, several peripheral units were added to allow the CPU to communicate with the surrounding environment. A memory controller was implemented to communicate between Data Memory, Instruction Memory, UART, a cycle counter, switches and LEDs on the FPGA, and an AC97 interface. This allowed a number of relatively complex and interesting programs to be run on the CPU, such as a piano program using the computer keyboard as inputs, since programs could be loaded onto the data or instruction memory and run on the FPGA. Part II: High Level Organization 1

3 The above diagram provides a high-level overview of our design. ml505top is the top level block, and the outputs of this block interface directly with ports on the FPGA. The tone generator block, RiscV151 CPU, and AC97 periphery (including the FIFOs for the microphone and speaker outputs) are located here. Within the CPU, the main 3-stage datapath is instantiated, which communicates between the memory controller to write to various inputs and outputs. Other features within the CPU are the FIFO to the GPIO leds and switches, the UART to communicate with external computing devices, a cycle counter, and branch predictor. Three types of memory are also included: the Instruction memory, BIOS memory, and Data memory. Our datapath consists of three combinatorial stages, each separated by a set of registers. Each stage takes in registered inputs, including a set of control signals, and uses them to drive the outputs of the next stage. A diagram is shown on the next page to illustrate the entire datapath. The first stage, the Instruction Fetch stage, decodes the instruction and converts it to a set of control signals for future stages. It determines the instructions inputs into the Execute Stage, including which values are fed into the ALU. It also determines the next address to read an instruction for, stored in the next_pc value. Decoded signals then send in read addresses to the RegFile so that they are available in the Execute Stage. These output into nextpc, regout1, regout2, and control signals as registers. The second stage, the Execute stage, is used to take inputs from the first stage and compute the alu output to write to either memory or registers. Data forwarding logic is needed to resolve hazards, which is discussed more in Part III. To reduce the fanout of the design, parallel arithmetic units compute the alu output and the branch compare output. Then, the data, address, and write enable bits are forwarded to the next stage. The third stage, the Writeback stage, consists of decoding the memory output and choosing what value to write back to the reg file. It also kills instructions related to branch mispredictions or JALRs. 2

4 Datapath 3

5 Part III: Component Description A. Control Signal Decoding The IFController block provides the decode logic in the first stage of the CPU. It takes in a 32-bit instruction from instruction memory or BIOS memory and then outputs relevant signals to the rest of the CPU. The signals include the relevant bit-shifting and sign extending of the immediate, as well as letting other components of the CPU know what type of instruction it is (jal, branch, register-file related, and what the instruction opcode is). B. Branch/Jump Implementation The pc update logic for our CPU is the following: if (3rd stage inst killed): if(1st stage is jal OR we predict a branch) AND 1st stage not killed: Next_pc = curr_pc + imm else: Next_pc = pc + 4 else: If 3rd stage is jalr: Next_pc = 3rd stage jalr pc else if 3rd stage is mispredicted branch: Next pc = 3rd stage branch_not_taken pc else if (1st stage is jal OR we predict to take a branch) AND 1st stage not killed: Next_pc = curr_pc + imm else: Next_pc = pc + 4 4

6 This updates the pc (program counter) with either a correction for a valid JALR (jump and link register) instruction or mispredicted branch, a valid JAL (jump and link) instruction, or the default address. This implementation will not kill instructions on a JAL, but kills two instructions for a JALR or mispredicted branch. The kills are implemented by using a kill signal that is sent by the final stage and forwarding it through the pipeline. When a stage is killed, the ability to affect the address is removed (see pc update logic, above). Additionally, write ports are disabled. We chose to implement the branch logic in this manner for several reasons: 1. Since our implementation needs to calculate the ALU immediate out of instruction fetch and add the two values by the second stage in order to do branch prediction, most of the hardware for calculating and taking a JAL in the first stage already would need to exist. We believed the cost of one additional mux would be worth the gain in CPI (cycles per instruction) by not needing to kill instructions on JALs. 2. While branches and JALRs originally were able to be taken in the second stage, we found that this design created a large critical path (approximately 50% longer than the next largest critical path) when combined with data forwarding into the ALU. Therefore, moving this out of the second stage would improve performance. 3. Since JALRs are relatively rare because it is uncommon to take large jumps, the only penalty for moving these operations into the third stage would be an additional cycle on mispredicted branches. Therefore, if we could properly predict branches, there would be minimal loss in CPI. C. Branch Prediction (Extra Credit) We implemented two different versions of the branch predictor. 1. The first version we implemented was a global branch predictor. It used a 2-bit saturating counter that would increment whenever a branch was taken, and decrement whenever a branch was not taken. The state would not change on instructions that were not branch instructions. If the counter value was 2 10 or 2 b11, it would tell the IF stage to predict that the branch was taken. This global branch predictor has a significant improvement, reducing CPI from 1.2 to 1.1 with little impact on maximum clock frequency. 2. The branch history table (BHT) worked very similarly to the global branch predictor. Instead of a single register, there would be 32 different registers that formed a history table, and addresses would update and receive branch predictions based on the [6:2] bits of their PC. This worked only marginally better than the global branch predictor, as it took only several hundred fewer cycles out of tens of millions of instructions, or about 0.001% reduction in runtime. It did not have impact on clock frequency, but used 10% more LUTs and Slice registers. 5

7 D. Register File The register file was implemented by using a 2D array to store values. It has a synchronous write port and two synchronous read ports. r0 cannot be written to and will always output a value of 32 b0. E. Arithmetic Logic Unit The Arithmetic Logic Unit (ALU) is the main component of the Execute Stage. It was programmed to be easy to read and debug in Verilog, using a case statement to choose the internal operation from a list of alu operations. It performs operations on alu_in1 and alu_in2 such as addition, greater than or equal comparison, and bitwise shifting. Inputs can be sign-extended if the ALU operation is sign extended. Later on for optimization purposes, we worked to reduce fanout through the ALU logic stage, and consequently removed a few unneeded operations from this block. This is discussed in more detail in Part IV. F. Data Forwarding In order to resolve data hazards, data forwarding was implemented to save on CPI. The data forwarding hazard logic is as follows. 1. If there was a register write, the register was not r0, and the next instruction used the same register, the register needed to be forwarded in front of the ALU in the execute stage. 2. If there was a register write and the second instruction after used the same register, the data would be forwarded in the end of the Instruction Fetch stage instead. 3. Lastly, if there was a load from memory and then a store to memory, the data needs to be forwarded to the Execute stage prior to the Data Memory block with a mux. We did not need to check if either register equals 0 because the register is designed such that an r0 read/write will always output the correct value. 6

8 G. Memory Controller The memory controller is a block that interfaces between the CPU and all the various forms of memory and peripheral circuits, as listed below. Signal Prefix Address Description mem_* N/A The interface with the CPU, whether or not the CPU wants to write/read something from memory dmem_* Addr[31:28] = 4 b00x1 Data memory, a data storage block for the CPU imem_* Write: Addr[31:28] = 4 b001x Read: Addr[31:28] = 4 b0001 Instruction memory, a read/write memory used to determine the instructions read by the CPU core bios_* Addr[31:28] = 4 b0100 BIOS memory, a pre-loaded memory block that ensures the CPU always boots up to a known state. uart_* counter_*, cycle_* GPIO_* TG_*, ac97_fifo_*, ac97_mic_* Control: 32 h8000_0000 Receive:32 h8000_0004 Transmit:32 h8000_0008 Cycle counter: 32 h8000_0010 Instruction counter: 32 h8000_0014 Reset counter to 0: 32 h8000_0018 FIFO empty: 32 h8000_0020 FIFO read data: 32 h8000_0024 DIP Switches: 32 h8000_0028 GPIO and Compass LEDs: 32 h8000_0030 TG Enable: 32 h8000_0034 TG Switch Period: 32 h8000_0038 AC97 FIFO Full: 32 h8000_0040 AC97 Fifo Sample: 32 h8000_0044 AC97 Volume: 32 h8000_0048 MIC FIFO Empty: 32 h8000_0050 MIC FIFO Sample: 32 h8000_0054 UART interface, for communication to/from other computers to the CPU on the FPGA Counter interface for determining the number of cycles/instructions that were undertaken. GPIO/DIP interface, for reading from the output switches as well as writing to the LEDs on the FPGA board The ac97 interface, which is responsible for coordinating the tone generator, headphones, and microphone. 7

9 This block reads the input address and then mapping the relevant data and address to the corresponding output port. Since the timing between signals can be important, some signals were required to be delayed. This also prevents potential combinational loops by ensuring that all inputs into memory are put through a register. Some small blocks (mem_data_from_reg, mem_data_to_reg) were implemented in Stage 2 and Stage 3, respectively, to ensure that the data memory being used was appropriately shifted for reading/writing half-words or bytes. Therefore, the input to the memory controller was already adjusted to be correct. H. Cycle Counter The cycle counter is a separate block that stores the number of clock cycles that pass as well as the number of valid instructions that were executed. Every clock cycle, the cycle counter increments. The instruction counter will increment if the inst_ena signal is high. This can be used to calculate CPI of the CPU. I. FIFO Two types of FIFOs were implemented for this project: synchronous FIFO and asynchronous FIFO. The synchronous FIFO is implemented by having a read pointer and write pointer that are incremented each time a read or write is performed, respectively. When the two addresses match on a read instruction, then the fifo s empty signal goes high. For an address match on a write instruction, the fifo full signal goes high. This block was used to buffer input/outputs for the GPIO buttons. The asynchronous FIFO is used to buffer signals that are operated on separate clock domains, for example the AC97 microphone and headphone interface. Gray-coded signals and synchronizers were implemented for signals that cross clock domains to ensure that only one input signal changes at a time. An additional bit (which is also gray-coded) is used as wrap-around bit to determine if the fifo is full or empty. To determine the value of the wrap-around bit, a gray-code to binary converter was used to compare the read and write pointers. This gray-code to binary converter functions for up to 8 bits (or an address depth of 2 7 ). For a write to the FIFO, the FIFO is set to full by checking if a write was fired and then comparing the non-gray coded addresses. If the wrap-around bit is different, then the full bit is set. Similarly, for a read from the FIFO, the empty bit is set if a read instruction occurred and both the wrap-around bit and the addresses match. 8

10 J. AC97 Controller The AC97_controller block implements an interface between the CPU and the AC97 codec. It takes in serial data from the AC97 codec, as well as data written to an audio sample FIFO from the CPU, and interfaces these with an output FIFO to the speakers and microphone. Since both the speaker and extra-credit microphone were implemented, the AC97 writes out to several different registers in Slot 1 in a repeating loop, including setting the microphone volume, record select register, record gain register, as well as the speaker registers from Lab 6. When the CPU wants to send data to the codec, it sends the samples out by writing them to memory. The memory controller determines the memory mapping to send the signal out to the AC97 FIFO, then the data is sent to the FIFO so long as the FIFO is not full. The AC97 controller will read values from the FIFO so long as the FIFO is not empty, and then sends out values to the codec as per the requirements. In other words, the AC97 sets all the relevant registers, and then send out the speaker data via Slot 3 and Slot 4. For the project graduate student requirement, the microphone from the AC97 codec was also implemented. When the AC97 codec is sending data to the CPU (from the microphone), values first get shifted in as a serial data interface from the codec. The AC97 reads the bits in Slot 0 to ensure that the frame is valid and that Slot 3 input is also ready. Then, it shifts in the 20-bit value in Slot 3 and sends it out to the microphone FIFO. Part IV: Status and Results Currently, we have implemented all parts of the project, as well as the AC97 microphone and a branch predictor. Our design runs up to 75 MHz, with all components fully functional. 9

11 The design area usage was 2341 SLICE LUTs and 1030 SLICE registers. This amount grew over time from our initial design, since we added several units to reduce the critical path by parallelizing certain operations. To optimize the design performance, a number of improvements from our original design were implemented and experimental data gathered on clock frequency and CPI. For example, we modified some logic from our timing critical path to improve maximum clock frequency. We also implemented two flavors of branch prediction. Further description of the trade-offs between each design change are described below. Design CPI for mmult Minimum Clock Period (ns) Maximum Clock Frequency (MHz) Time To Complete (s) Base design ALU - changed forwarding order Moved branch logic to Stage 3 Reduced ALU fanout/added more units Global branch predictor Branch history table Base Design Our base design was unable to run at a clock frequency beyond 50 MHz. This is because of a long critical path where data is forwarded from the memory output, muxed in with the data from the instruction fetch stage, and then sent into the EX Stage through the ALU. While the base design optimized CPI by maximizing data forwarding and trying to reduce the number of killed signals that were required, overall performance suffered from the long critical path. 2. ALU Forwarding Change We then attempted to reducing critical path without increasing CPI. We noticed that the late arrival of the forwarded signal was unnecessarily preventing the clock speed from increasing, forcing other signals to wait. To remedy this, we changed the order of the data forwarding logic so that it would use the forwarded value as late as possible, such that the 10

12 setup time could be relaxed slightly. While we were still unable to close timing with a 60 MHz CPU clock, this change cut more than 2 ns from the critical path. 3. Moving Branches to Stage 3 Another critical path that now became important was the path from the ALU through the branch logic. To further increase clock frequency, we were now forced to move the branch logic into the third stage. This implementation is discussed in Section III.B. The degradation of CPI from this change derived from the fact that after a branch or JALR instruction, the following two instructions were killed instead of just one instruction. This meant that every mispredicted branch would add an additional cycle. The benefit is that the data forwarded path no longer needed to go through the branch logic and into instruction memory, since there would be a register breaking up this path. With these changes, the clock frequency was able to be increased to 60 MHz, which more than offset the increase in CPI. We found that the critical path still went through the ALU, notably into the addr and write_enable bits of the memory. 4. Reducing ALU fanout/adding additional arithmetic units One reason that the ALU was part of the critical path is that the ALU inputs need to drive a large number of computational units, which each need to drive multiple outputs (at least in the old version of the design). This multiplicative fanout may have increased the timing delay of this critical path. Another reason that it is in the critical path is that the memory can forward into the ALU, whose outputs drive the address and the write enable bits. By adding a specialized arithmetic unit for branch calculation, we reduced the fanout by decreasing the amount of outputs driven by the ALU, as well as making the ALU output only feed into an aluout register. Now branch comparison logic is driven by a different unit, called branch_compare. Another change made in this step was adding units to calculate the memory address and the write_enable bits. Since the memory address will always be IF_imm + aluin1 (forwarded), we removed this calculation from the ALU and thus further reduced the fanout. An even more substantial improvement was made by removing aluout as the path for the write enable bits. Since only the 2 least-significant bits hold information, we do not need to use the full ALU output to do this calculation: rather, we can do a simple 2 bit add of addr[1:0]+offset[1:0] to calculate these bits. The combination of all these changes boosted our achievable clock speed to 75 MHz. 5. Global Branch Prediction After increasing the clock speed substantially, we sought to recover some of the lost CPI by adding branch prediction. Our initial thought was to use a 2-bit saturating counter for global branch prediction, hoping to use temporal locality in the sense that: If we had recently taken a lot of branches, we are likely to take the next branch and vice versa. With 11

13 that, we were able to reduce CPI from 1.21 to 1.1, which represent a reduction of more than 7 million cycles. 6. Branch History Table We then implemented a Branch History Table, using the pc[6:2] to index into the 32 different addresses. When the mmult test program was run, we found that there was a 8 cycle reduction in execution time out of 60 million+ instructions in comparison with the global branch predictor, so we found this change insignificant. However, the trade-off to the branch history table was an approximately 10% increase in area, making the global branch predictor an overall better option. Part V: Concluding Remarks The design of a CPU on an FPGA proved to be challenging and exciting - it was shown that there was a need to optimize the tradeoff between clock speed optimization and cycles per instruction of the CPU. In addition, we found that interfaces and codec are extremely useful for interconnected computing components, and were able to implement a variety of features for our CPU, including a branch predictor, communication via the UART as well as the AC97 codec. To successfully complete our project, it was important to work as a team to think through challenges and debug, as well as keep good project timelines, organization, and organization. Some aspects that could have been improved include keeping our documentation updated throughout the project instead of going back and re-reading code to understand how each component functioned. In addition, it may have been possible to further increase the speed of the processor by sacrificing some CPI by removing some of the hazard control logic, or else creating a 4 or 5 stage pipeline. We were finally able to achieve a fully functional 3-stage CPU at an operating speed of greater than 75 MHz on the Virtex 6 FPGA, with a CPI of around 1.1 using the mmult test program. 12

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O September 22, 2011 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http://www-inst.eecs.berkeley.edu/~cs150

More information

CS150 Project Final Report

CS150 Project Final Report CS150 Project Final Report Max Nuyens and Casey Duckering cs150 bp and cs150 bo Team 1 12/12/14 Project Functional Description and Design Requirements: The objective of our project was to implement a functional

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1 EECS150 - Digital Design Lecture 08 - Project Introduction Part 1 Feb 9, 2012 John Wawrzynek Spring 2012 EECS150 - Lec08-proj1 Page 1 Project Overview A. Pipelined CPU review B.MIPS150 pipeline structure

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11 The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 ANSWER KEY November 23 rd, 2010 Name: University of Michigan uniqname: (NOT your student ID

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Design of Decode, Control and Associated Datapath Units

Design of Decode, Control and Associated Datapath Units 1 Design of Decode, Control and Associated Datapath Units ECE/CS 3710 - Computer Design Lab Lab 3 - Due Date: Thu Oct 18 I. OVERVIEW In the previous lab, you have designed the ALU and hooked it up with

More information

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141

EECS 151/251A Spring 2019 Digital Design and Integrated Circuits. Instructor: John Wawrzynek. Lecture 18 EE141 EECS 151/251A Spring 2019 Digital Design and Integrated Circuits Instructor: John Wawrzynek Lecture 18 Memory Blocks Multi-ported RAM Combining Memory blocks FIFOs FPGA memory blocks Memory block synthesis

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Computer Architecture 2/26/01 Lecture #

Computer Architecture 2/26/01 Lecture # Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #8 Dynamic Branch Prediction

More information

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used? 2.10 [20] < 2.2, 2.5> For each LEGv8 instruction in Exercise 2.9 (copied below), show the value of the opcode (Op), source register (Rn), and target register (Rd or Rt) fields. For the I-type instructions,

More information

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 1. Performance Principles [5 pts] The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 For each of the following comparisons,

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Lecture: Out-of-order Processors

Lecture: Out-of-order Processors Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer 1 Amdahl s Law Architecture design is very bottleneck-driven

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

CS 351 Exam 2, Fall 2012

CS 351 Exam 2, Fall 2012 CS 351 Exam 2, Fall 2012 Your name: Rules You may use one handwritten 8.5 x 11 cheat sheet (front and back). This is the only resource you may consult during this exam. Include explanations and comments

More information

CS150 Fall 2012 Solutions to Homework 6

CS150 Fall 2012 Solutions to Homework 6 CS150 Fall 2012 Solutions to Homework 6 October 6, 2012 Problem 1 a.) Answer: 0.09 ns This delay is given in Table 65 as T ILO, specifically An Dn LUT address to A. b.) Answer: 0.41 ns In Table 65, this

More information

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM

More information

Constructive Computer Architecture Tutorial 6: Discussion for lab6. October 7, 2013 T05-1

Constructive Computer Architecture Tutorial 6: Discussion for lab6. October 7, 2013 T05-1 Constructive Computer Architecture Tutorial 6: Discussion for lab6 October 7, 2013 T05-1 Introduction Lab 6 involves creating a 6 stage pipelined processor from a 2 stage pipeline This requires a lot of

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Data paths for MIPS instructions

Data paths for MIPS instructions You are familiar with how MIPS programs step from one instruction to the next, and how branches can occur conditionally or unconditionally. We next examine the machine level representation of how MIPS

More information

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements

EECS150 - Digital Design Lecture 11 SRAM (II), Caches. Announcements EECS15 - Digital Design Lecture 11 SRAM (II), Caches September 29, 211 Elad Alon Electrical Engineering and Computer Sciences University of California, Berkeley http//www-inst.eecs.berkeley.edu/~cs15 Fall

More information

Superscalar SMIPS Processor

Superscalar SMIPS Processor Superscalar SMIPS Processor Group 2 Qian (Vicky) Liu Cliff Frey 1 Introduction Our project is the implementation of a superscalar processor that implements the SMIPS specification. Our primary goal is

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS3350B Computer Architecture Winter 2015

CS3350B Computer Architecture Winter 2015 CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson

More information

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)

More information

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design 1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion

More information

Where Does The Cpu Store The Address Of The

Where Does The Cpu Store The Address Of The Where Does The Cpu Store The Address Of The Next Instruction To Be Fetched The three most important buses are the address, the data, and the control buses. The CPU always knows where to find the next instruction

More information

SPART. SPART Design. A Special Purpose Asynchronous Receiver/Transmitter. The objectives of this miniproject are to:

SPART. SPART Design. A Special Purpose Asynchronous Receiver/Transmitter. The objectives of this miniproject are to: SPART A Special Purpose Asynchronous Receiver/Transmitter Introduction In this miniproject you are to implement a Special Purpose Asynchronous Receiver/Transmitter (SPART). The SPART can be integrated

More information

The Pipelined RiSC-16

The Pipelined RiSC-16 The Pipelined RiSC-16 ENEE 446: Digital Computer Design, Fall 2000 Prof. Bruce Jacob This paper describes a pipelined implementation of the 16-bit Ridiculously Simple Computer (RiSC-16), a teaching ISA

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS252 Graduate Computer Architecture

CS252 Graduate Computer Architecture CS252 Graduate Computer Architecture University of California Dept. of Electrical Engineering and Computer Sciences David E. Culler Spring 2005 Last name: Solutions First name I certify that my answers

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted fact sheet, with a restriction: 1) one 8.5x11 sheet, both sides, handwritten

More information

Digital System Design Using Verilog. - Processing Unit Design

Digital System Design Using Verilog. - Processing Unit Design Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H Katz http://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer eg, Search

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

Non-pipelined Multicycle processors

Non-pipelined Multicycle processors Non-pipelined Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Code for the lecture is available on the course website under the code tab

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Material from Earlier in the Semester Throughput and latency

More information

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath The MIPS Datapath 1 The Critical Path and Circuit Timing The critical path is the slowest path through the circuit For a synchronous circuit, the clock cycle must be longer than the critical path otherwise

More information

Chip-8 Design Specification

Chip-8 Design Specification Columbia University Embedded Systems 4840 Chip-8 Design Specification Authors: Ashley Kling (ask2203) Levi Oliver (lpo2105) Gabrielle Taylor (gat2118) David Watkins (djw2146) Supervisor: Prof. Stephen

More information

CPU_EU. 256x16 Memory

CPU_EU. 256x16 Memory Team Members We are submitting our own work, and we understand severe penalties will be assessed if we submit work for credit that is not our own. Print Name Print Name GRADER ID Number ID Number Estimated

More information

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction Instructor: Alan Christopher 7/28/214 Summer 214 -- Lecture #2 1 Review of Last Lecture Critical path constrains clock

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type

More information

Computer Organization and Structure

Computer Organization and Structure Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What

More information

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit

Lecture1: introduction. Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit Lecture1: introduction Outline: History overview Central processing unite Register set Special purpose address registers Datapath Control unit 1 1. History overview Computer systems have conventionally

More information

Assignment 1 solutions

Assignment 1 solutions Assignment solutions. The jal instruction does a jump identical to the j instruction (i.e., replacing the low order 28 bits of the with the ress in the instruction) and also writes the value of the + 4

More information

EECS 470 Midterm Exam Answer Key Fall 2004

EECS 470 Midterm Exam Answer Key Fall 2004 EECS 470 Midterm Exam Answer Key Fall 2004 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points Part I /23 Part

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #7 Subroutine Calls and Static

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

The overall datapath for RT, lw,sw beq instrucution

The overall datapath for RT, lw,sw beq instrucution Designing The Main Control Unit: Remember the three instruction classes {R-type, Memory, Branch}: a) R-type : Op rs rt rd shamt funct 1.src 2.src dest. 31-26 25-21 20-16 15-11 10-6 5-0 a) Memory : Op rs

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Topic #6. Processor Design

Topic #6. Processor Design Topic #6 Processor Design Major Goals! To present the single-cycle implementation and to develop the student's understanding of combinational and clocked sequential circuits and the relationship between

More information

Design Project Computation Structures Fall 2018

Design Project Computation Structures Fall 2018 Due date: Friday December 7th 11:59:59pm EST. This is a hard deadline: To comply with MIT rules, we cannot allow the use of late days. Getting started: To create your initial Design Project repository,

More information

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines

Branch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines 6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III

More information

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed

More information

Serial Communication Through an Asynchronous FIFO Buffer

Serial Communication Through an Asynchronous FIFO Buffer Serial Communication Through an Asynchronous FIFO Buffer Final Project Report December 9, 2000 E155 Nick Bodnaruk and Andrew Ingram Abstract: For our clinic, we need to be able to use serial communication

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #18 Introduction to CPU Design 2007-7-25 Scott Beamer, Instructor CS61C L18 Introduction to CPU Design (1) What about overflow? Consider

More information

ENGG3380: Computer Organization and Design Lab5: Microprogrammed Control

ENGG3380: Computer Organization and Design Lab5: Microprogrammed Control ENGG330: Computer Organization and Design Lab5: Microprogrammed Control School of Engineering, University of Guelph Winter 201 1 Objectives: The objectives of this lab are to: Start Date: Week #5 201 Due

More information

CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14

CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14 CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14 http://inst.eecs.berkeley.edu/~cs152/sp13 The problem

More information

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.

More information

EECS Digital Design

EECS Digital Design EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance

More information

Assembly Language Programming

Assembly Language Programming Assembly Language Programming Ľudmila Jánošíková Department of Mathematical Methods and Operations Research Faculty of Management Science and Informatics University of Žilina tel.: 421 41 513 4200 Ludmila.Janosikova@fri.uniza.sk

More information

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted fact sheet, with a restriction: 1) one 8.5x11 sheet, both sides, handwritten

More information

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ... CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100

More information

ECE 341 Final Exam Solution

ECE 341 Final Exam Solution ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information