EXERCISE 3: DLX II - Control

Size: px
Start display at page:

Download "EXERCISE 3: DLX II - Control"

Transcription

1 TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3200 Computer Architectures I EXERCISE 3: DLX II - Control Group Name Student nr.

2 DLX-CONTROL The meaning of this exercise The meaning of this exercise is to help you understand how the control logic of a pipelined processor is realized. It is highly recommended to finish the exercise DLX-ARCHITECTURE before starting with the exercise. Procedure calls in a pipelined processor A pipelined processor, like DLX, supports procedure calls in a different way compared to unpipelined processors, like M68000 (Motorola 68000). M68000 stores the return address for a jump always into the stack of the system, and one of the registers of the processor is reserved to point to this stack. And since the entire stack is in the memory, a procedure call causes a write into the memory and a return from a procedure causes a read from the memory. If a procedure doesn t call another procedure, fetching the return address from the memory can be eliminated by storing the return address into a register. DLX processor and many other pipelined processors use this procedure. There is a certain register to where the return address will be stored. DLX uses register R31 for this purpose. A procedure is called using instruction JAL offset. An example: JAL SUBR Execution jumps to address PC+4+offset and the address of the jump instruction (PC) is stored into register R31. A procedure call is considered a jump instruction and if delayed branching is used, the instruction following the procedure call will also be executed. Thus, the return address to be stored cannot point to the instruction following the procedure call, but to the next following instruction. An example: 0020 JAL ADD R1,R2,R SUB R1,R2,R ADDI R31,R31,# JR R31 Notice that each instruction is four bytes, which is an important property of a pipelined processor. In the previous program the instruction ADD R1,R2,R3 will be executed before the program execution continues at address =34. Instruction JAL will store address 20 into the register R31. After the procedure is executed the program execution should continue at address 28. This means that the programmer must fix the value in the register R31. This fix is performed in the previous example program by an operation ADDI R31,R31,#8. The return jump is executed with an operation JR to the address that is located in the register R31. 1/14

3 To support procedure call and return from a procedure the following operations need to be supported: - The value of the program counter needs to be stored into register R31 - The program counter needs to be capable of loading a value from a register The datapath model of DLX does not support neither of these operations because the necessary buses are missing. The following SUPER DLX-model includes these necessary buses. The following figure illustrates the structure of the SUPER DLX-model. How the previously mentioned operations are performed in the SUPER DLX-model for instructions JAL and JR Rx? Now we are aware of how first level procedure calls are realized. What happens if a procedure calls another procedures? To support procedure calls on several levels, we need a stack to where return addresses are to be stored. As our processor doesn t have a system stack, we need to build and maintain such a stack ourselves using the available instructions. We can choose one of the registers as the stack pointer, for example register R30. The example program below calls procedure KALLE, which calls procedure NISSE. Procedures can only handle the return addresses and the stack pointer R30. 2/14

4 JAL KALLE... KALLE: ADDI R31,R31,#8 SUBI R30,R30,#4 SW 0(R30),R31 JAL NISSE NOP LW R31,0(R30) ADDI R30,R30,#4 JR R31 NISSE: ADDI R31,R31,#8 JR R31 Explain what the program does? Modify procedure NISSE so that it calls procedure OLLI: Investigate the following program: ADD R1,R0,R0 JAL SUBR ADDI R1,R1,#2 ADD R1,R1,R1... SUBR: ADDI R1,R1,#1 ADDI R31,R31,#8 JR R31 NOP What does register R1 contain just before instruction ADDI R1,R1,#1 is executed? (assume: R0=0) What does register R1 contain just before instruction JR R31 is exeduted? What does regiser R1 contain when instruction ADD R1,R1,R1 has just been executed? 3/14

5 Start the simulator and choose datapath model SUPER DLX. The program is in file exempel11.s. Load it and reset the counter. Click on the register file, which should pop up a window that shows the content of the register file. Give clock pulses until the instruction JAL is being fetched from the memory and is at IF-stage. What is the address of the JAL instruction? (At which address is it located in the memory) Give clock pulses until instruction JAL reaches MEM-stage. What does register R31 contain? Give a clock pulse. What does register R31 contain now? Why is that? The first instruction of the procedure SUBR has been already fetched. At which stage is it at? Give clock pulses until instruction JR R31 is at ID-stage. From where in the datapath is the value of register R31 obtained? Why? What is the return address? Give a clock pulse. What is now the value of the program counter? Execute the program until the end with the command RUN. What is the value of register R1? Is it the same as the value you calculated previously? Now we are going to investigate how procedures of several levels are realized. Read the following example program carefully through (the code continues on the next page): ADDI R1,R0,#1 ADDI R20,R0,#78 ADDI R2,R0,#2 ADD R3,R1,R1 JAL SUBR NOP ADD R4,R3,R0... SUBR: SUBI R2,R2,#1 ADDI R3,R3,#1 BEQZ R2,BACK ADDI R31,R31,#8 4/14

6 SUBI R20,R20,#4 SW 0(R20),R31 JAL SUBR NOP LW R31,0(R20) ADDI R20,R20,#4 BACK: JR R31 What register acts as a stack pointer? How many times procedure SUBR is called? What does register R4 contain right after executing instruction ADD R4 R3,R0? The program is in file exempel12.s. Load it and reset the counter. Give clock pulses until instruction JAL is being fetched from the memory and is at IF-stage. Determine the return address on the basis of the value of the program counter. Give clock pulses until instruction ADDI R31,R31,#8 reaches EX-stage. What is the value on the output of the ALU? Compare it to the return address you determined. What is now the value of the stack pointer? Give clock pulses until procedure s instruction JAL is being fetched from the memory and is at IFstage. What should be the value of the return address? Give clock pulses until procedure s instruction JAL reaches EX-stage. What is the value of the stack pointer and why? Give clock pulses until instruction JR R31 reaches ID-stage. What is now the value of register R31? To which address the program is going to return and why? From where is the return address obtained? Give clock pulses until JR R31 reaches again ID-stage. To which address are we returning now and why? From where is the return address obtained now? Execute the program until the end with the command RUN. What is now the value of register R4? 5/14

7 Confirm that it is the same compared to the one that you calculated. The reason why the stack mechanism is not supported is that stack-operations may cause hazards in the pipeline. Performance would then decrease. Controlling a pipelined processor Now we have studied what actions does the datapath of a pipelined processor contain. Next we are investigating the principles of controlling the datapath. Controlling the datapath is fairly simple. However, you should be aware of a few basic concepts to which the controlling is based on. These are instruction format and control signals. Instruction format Instruction format defines how the instructions of a processor are coded. Coded information contains all the necessary information (instructions) to execute an instruction. Open the link Computer Architecture Tutorial on the web-pages of this exercise and read the parts that describe the instruction set of the DLX-processor and its instruction formats. For a pipelined processor it is very important that all the instructions are equal length. If this is not the case, there will be bubbles in the pipeline due to following reasons: Let us assume that instruction i is one word and instruction i+1 is four words. When instruction i is fetched, we need four clock cycles until we have fetched instruction i+1. During fetching of instruction i+1 there are stages in the pipeline that don t do anything useful. Thus, parallel execution of the stages on the pipeline is not fully utilized in this case. Instruction decoding and control signals The control can be thought to be built in a way that a 32-bit wide register, to where the instruction can be stored, is added to each stage. With the instruction decoder we could form the control signals for functional units at each pipeline stage. For example EX-stage: Both the ALU and multiplexers MX4 and MX5 need control signals. We can use a PLA (Programmable Logic Array) or ROM which would form the control signals for the ALU and multiplexers on the basis of the input signal. However, this is not a good idea. Processor DLX has approximately 100 instructions. The delay through PLA is highly dependent on the number of minterms in the switch-function. Let us call this time T PLA. If the delay of the ALU is called T ALU, it takes at least T PLA +T ALU, before the ALU produces stable outputs. The clock cycle time of the pipeline is determined by its slowest stage, and often the ALU turns out to be the slowest, so instruction decoding cannot be done at this stage. Instruction decoding needs to be done at ID-stage for all the stages of the pipeline. Due to the clever formatting of the instruction set, the decoding and register reading can be done in parallel at ID-stage. Why is this possible? (investigate the instruction formats!) Instruction decoding is usually made so that all the control signals are created already at ID-stage. Control signals are transported in the pipeline using control signal registers, which exist in every stage of the pipeline. Processor SUPER DLX, the one we are researching in the exercise, uses 36 control signals. We only investigate few of these. 6/14

8 The following figure depicts SUPER DLX processor s instruction decoder, control signal registers and some chosen control signals. The instruction decoder is located at ID-stage and is realized using a PLA. It decodes the field Opcode and Func of the instruction. The 36 control signals are formed out of these fields and they follow the instruction in the pipeline. This is possible due to having a 36-bit wide control register at each clock line of the pipeline. At EX-stage three groups of the control signals are used. These are called control groups as they can contain more that just one control signal. The control group of the ALU contains four control signals that are defined in Table 1. Control groups MX4 and MX5 control the multiplexers of the EX-stage. These control signals are defined in Table 2. At MEM-stage the memory is controlled with signals WRITE and READ. In addition, the multiplexer MX6 is controlled with the signal MX6. At the final WB-stage a control signal is needed for the instructions that write in the register file (REG_WRITE-signal). However, this is not enough. We need to take care of that the target register is transported in the pipeline. Due to this there is a 5-bit register at each clock line that holds the address of the targer register. Unfortunately, the target register is not always explicitly defined in the instruction. A good example is procedure call JAL, that always writes the return address to register R31. Due to this, there is a multiplexer at ID-stage that is controlled by the decoder. With this multiplexer the address of the register R31 can be loaded into the 5-bit register that holds the address of the target register. This multpliexer has also another important function: it can be used correspondingly to choose the address of the target register for loading a value into the register as the target register is located in different places in I and R instructions formats. The multiplexer is controlled by a control group RD (see Table 3). Important: You might be wondering why register R0 is needed as a targer register. That will become clear when you will be investigating how to detect hazards in the pipeline. Then the instructions that do not perform a register write need to specify R0 as the target register. You will be wondering this when you are defining the control signals. 7/14

9 Table 1 ALU operation Control signals 1-4 ADD 0000 SUB 0001 OR 0010 AND 0011 XOR 0100 LHI 0101 SLL 0110 SRL 0111 SRA 1000 EQ 1001 NE 1010 LT 1011 GT 1100 LE 1101 GE 1110 PASS_A 1111 Control group Control signal number Table 2 Value Function Value Function MX Register file 1 PC MX5 6 0 Register file 1 Immediate MX ALU 1 Memory WRITE 10 0 Don t write 1 Write READ 11 0 Don t read 1 Read REG-WRITE 17 0 Don t write 1 Write 8/14

10 Table 3 RD-operation Control signals R0 00 R31 01 I-FORMAT 10 R-FORMAT 11 At the end of this exercise you will be implementing decoding for a couple instructions. Due to this the control signals of the functional units are defined in Tables 1-3. Table 4 illustrates what values of the control signals should be for ADD operation. Complete the Table 4 by assigning correct control signals for the rest of the operations with the help of Tables 1-3. Table 4 Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE ADD ADDI SUB SUBI OR LW SW BNEZ BEQZ Your task is to program the content of the DECODE PLA according to the values you filled in Table 4. You will accomplish this by using specification language labalaba, which will define in/out functions for all the operations. In a file decode.ipf there is a model of labalaba-code. The function is created for ADD-operation. After labalaba directive input all the input vectors will be defined. The instruction will be identified by comparing it to these vectors (all the bits are not necessarily needed when an instruction is identified). In these cases they will be marked using dash (-). After directive output all the output signals will be named and grouped, which makes the creation of the functions easier. All the functions will be defined after labalaba-directive function. Every defined input vector needs to have a function that defines the corresponding output signals using the signals or signal groups defined in the output section. 9/14

11 A model of the labalaba-specification is given in file decode.ipf. Complete the functions for all the other operations by editing this file. After finishing the specification, compile it using the following command:./labalaba decode.ipf Fix all the possible syntactic errors. When labalaba does not complain about errors, it creates a file called decode.opf. Choose View Control from the menu. This will show you the datapath model of the control logic of the DLX SUPER. Identify DECODE PLA, all the control signal registers and the control signals that you will be using. Instruction register is divided into parts according to R instruction format. All the visible control signals are shown in binary format. Clicking on DECODE PLA will open a load-window. Load your labalaba-speficiation (decode.opf). Every instruction will now be tested separately starting from instruction ADDI R2,R0,#4. The instruction is in file addi.s. Load it, reset the counter. Give clock pulses until the instruction is at ID-stage. One register is read at ID-stage, what is it? Give a clock pulse. Where is the address of the target register? What are the states of MX4 and MX5? What should be the value of MX4 and MX5 control groups according to Table 2? Is the value on the output of the ALU correct? If it is not, what is the value of the ALU control group? Give a clock pulse to take the instruction to MEM-stage. What memory operation is performed? Is MX6 at right stage? If not, what is the value of the MX6 control group? 10/14

12 Give a clock pulse. Is the control signal REG_WRITE active? Is the address of the target register correct? Will the register R2 obtain the correct value? If the execution did not work correctly, search for errors in your labalaba-specification and recompile. The rest of the operations will be tested similarly. Use the following testfiles: Operation SUB R1,R0,R0, file sub.s Operation SUBI R1,R0,#2, file subi.s Operation OR R20,R0,R0, file or.s Operation LW R1,C(R2), file lw.s Operation SW C(R2),R1, file sw.s Operation BNEZ R0,4, file bnez.s Operation BEQZ R0,4, file beqz.s Fill in the following table as you proceed in your tests: Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE ADD ADDI SUB SUBI OR LW SW BNEZ BEQZ Compare your results to Table 4. Now we have realized the controls for some of the DLX instructions. Realization of other instructions should not cause any specific problems. So far we have ignored an inportant fact: How to identify hazards in the pipeline? Furthermore, we haven t investigated the control of MX2 and MX3. These multiplexers control the bypassing. Moreover, how to stop the pipeline when a hazard occurs? These issues will be covered next. 11/14

13 Hazard-logic Bypass technique needs to be used or the pipeline needs to be halted if the target register of an instruction at EX- or MEM-stage is the a source register for an instruction at ID-stage. We start from the bypass technique. By transporting the address of the target register along the pipeline we can use comparators to compare them to the source registers of instructions at ID-stage. Source operand is specified in field RS1. The figure below indicates how a data hazard is identified for the source operand of an instruction at ID-stage using comparisons. Hazard-logic has five input signals: USE_RS1, EQ_EX, EQ_MEM, EQ_R0 and LOAD. There are three outputs: MX2 (2 signals) and STALL. The outputs are used to select correct controls for the multiplexer MX2 (selection according to Table 5) or if needed, halt the pipeline. Halting the pipeline happens as follows: D-flipflops between IF- and ID-stages are turned off (a new value is not read in) and harmless operation STALL is fed to the pipeline. Harmless instruction means that nothing is written at MEM- or WB-stages. Table 5. MX2-operation Value Register file 00 Bypass from EX-stage 01 Bypass from MEM-stage 10 We will now investigate if all the simultaneously occurring hazards can be detected using the following test program: ADDI R1,R0,#1 ; R1 <- 1 ADD R2,R1,R0 ; R2 <- R1+R0 ADD R3,R1,R0 ; R3 <- R1+R0 ADD R4,R1,R0 ; R4 <- R1+R0 Load the program (exempel13.s). 12/14

14 Give clock pulses until instruction ADD R2,R1,R0 is at ID-stage. Will that instruction get the correct value of the register R1? What are the values of MX2 control group? (Compare it to the values in Table 5) Give a clock pulse. From where will the instruction at ID-stage get the correct value of the register R1? Next we will investigate halting the pipeline with the following test program: LW R1,18(R0) ; R1 <- M[18] ADD R2,R1,R0 ; R2 <- R1+R0 ADD R3,R1,R0 ; R3 <- R1+R0 ADD R4,R1,R0 ; R4 <- R1+R0 At what stage should the pipeline halt? Load the program (exempel14.s). At what stage is the LOAD operation when the STALL output of the HAZARD-PLA gets value 1? Give a clock pulse. Is the STALL-signal still active? Make sure the program is executed correctly. Investigate the following program: LW R1,18(R0) ; R1 <- M[18] ADD R1,R1,R1 ; R1 <- R1+R1 ADD R1,R1,R1 ; R1 <- R1+R1 ADD R4,R1,R1 ; R4 <- R1+R1 What comes to identifying hazards, what is the essential difference between this and the previous program (exempel14.s)? Load program (exempel14b.s). Observe if the program is executed correctly. 13/14

15 Next we will investigate the specification of DECODE-PLA. Why ADD-operation activates both the USE_RS1 and USE_RS2 control signals? Why ADDI-operation activates only the USE_RS1 control signal? Mention an instruction that doesn t activate USE_RS1 and USE_RS2 control signals? Finally, your task is to realize the control for the instructions that enable procedure calls and returns from procedures. In a procedure call, the address where the JAL (or JALR) instruction is located has to be stored into the register R31. This value is obtained from the program counter. In order to be able to write into the register R31, the target register ID needs to be = , which can be formed by specifying correct values for the signals of the RD control group (see Table 3). In addition, the value of the program counter has to go through the ALU without performing any ALU operation. This ALU-operation is denoted as PASS_A. Fill in the following table: Stage ID EX MEM WB Control group RD ALU MX5 MX4 WRITE READ MX6 REG_WRITE JAL JR Add the functions for JAL and JR operations into the file decode.ipf. Compile the file. Test the operations by running testprogram exempel11.s. If necessary, fix the errors in the specification. Give clock pulses until instruction ADDI R1,R1,#2 is at MEM-stage. What does register R31 contain? Is its value correct? If it is not, fix the errors in the labalaba-specification. Execute the program until the end. Make sure register R1 contains the right value! You have now completed DLX-CONTROL exercise! Return this exercise paper together with the coversheet. (BOX 518, Tietotalo building, 4th floor, corridor G). 14/14

TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems. Exercise 2: DLX I - Architecture

TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems. Exercise 2: DLX I - Architecture TAMPERE UNIVERSITY OF TECHNOLOGY Institute of Digital and Computer Systems TKT-3206 Computer Architecture I Exercise 2: DLX I - Architecture.. 2007 Group Name stud. num. General info about the exercise

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

6.823 Computer System Architecture Datapath for DLX Problem Set #2

6.823 Computer System Architecture Datapath for DLX Problem Set #2 6.823 Computer System Architecture Datapath for DLX Problem Set #2 Spring 2002 Students are allowed to collaborate in groups of up to 3 people. A group hands in only one copy of the solution to a problem

More information

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions Experiment 4 R-type Instructions 4.1 Introduction This part is dedicated to the design of a processor based on a simplified version of the DLX architecture. The DLX is a RISC processor architecture designed

More information

Design for a simplified DLX (SDLX) processor Rajat Moona

Design for a simplified DLX (SDLX) processor Rajat Moona Design for a simplified DLX (SDLX) processor Rajat Moona moona@iitk.ac.in In this handout we shall see the design of a simplified DLX (SDLX) processor. We shall assume that the readers are familiar with

More information

6.004 Tutorial Problems L22 Branch Prediction

6.004 Tutorial Problems L22 Branch Prediction 6.004 Tutorial Problems L22 Branch Prediction Branch target buffer (BTB): Direct-mapped cache (can also be set-associative) that stores the target address of jumps and taken branches. The BTB is searched

More information

CS3350B Computer Architecture Winter 2015

CS3350B Computer Architecture Winter 2015 CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson

More information

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are

More information

DLX computer. Electronic Computers M

DLX computer. Electronic Computers M DLX computer Electronic Computers 1 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90%

More information

The Evolution of Microprocessors. Per Stenström

The Evolution of Microprocessors. Per Stenström The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Laboratory Exercise 6 Pipelined Processors 0.0

Laboratory Exercise 6 Pipelined Processors 0.0 Laboratory Exercise 6 Pipelined Processors 0.0 Goals After this laboratory exercise, you should understand the basic principles of how pipelining works, including the problems of data and branch hazards

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering ECE260: Fundamentals of Computer Engineering Pipelined Datapath and Control James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania ECE260: Fundamentals of Computer Engineering

More information

Computer System Architecture Midterm Examination Spring 2002

Computer System Architecture Midterm Examination Spring 2002 Computer System Architecture 6.823 Midterm Examination Spring 2002 Name: This is an open book, open notes exam. 110 Minutes 1 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Microprogrammed Control Approach

Microprogrammed Control Approach Microprogrammed Control Approach Considering the FSM for our MIPS subset has 10 states, the complete MIPS instruction set, which contains more than 100 instructions, and considering that these instructions

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Very Simple MIPS Implementation

Very Simple MIPS Implementation 06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,

More information

CSE 378 Midterm 2/12/10 Sample Solution

CSE 378 Midterm 2/12/10 Sample Solution Question 1. (6 points) (a) Rewrite the instruction sub $v0,$t8,$a2 using absolute register numbers instead of symbolic names (i.e., if the instruction contained $at, you would rewrite that as $1.) sub

More information

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

Chapter 13. The ISA of a simplified DLX Why use abstractions?

Chapter 13. The ISA of a simplified DLX Why use abstractions? Chapter 13 The ISA of a simplified DLX In this chapter we describe a specification of a simple microprocessor called the simplified DLX. The specification is called an instruction set architecture (ISA).

More information

Binvert Operation (add, and, or) M U X

Binvert Operation (add, and, or) M U X Exercises 5 - IPS datapath and control Questions 1. In the circuit of the AL back in lecture 4, we included an adder, an AND gate, and an OR gate. A multiplexor was used to select one of these three values.

More information

Chapter 4 The Processor 1. Chapter 4B. The Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always

More information

Digital Logic Design: a rigorous approach c

Digital Logic Design: a rigorous approach c Digital Logic Design: a rigorous approach c Chapter 21: The ISA of a Simplified DLX Guy Even Moti Medina School of Electrical Engineering Tel-Aviv Univ. June 13, 2016 Book Homepage: http://www.eng.tau.ac.il/~guy/even-medina

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath The MIPS Datapath 1 The Critical Path and Circuit Timing The critical path is the slowest path through the circuit For a synchronous circuit, the clock cycle must be longer than the critical path otherwise

More information

3. (2 pts) Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

3. (2 pts) Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? . (2 pts) What are the two main ways to define performance? 2. (2 pts) What is Amdahl s law, inwords? 3. (2 pts) Clock rates have grown by a factor of while power consumed has only grown by a factor of

More information

Static, multiple-issue (superscaler) pipelines

Static, multiple-issue (superscaler) pipelines Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue

More information

Very short answer questions. "True" and "False" are considered short answers.

Very short answer questions. True and False are considered short answers. Very short answer questions. "True" and "False" are considered short answers. (1) What is the biggest problem facing MIMD processors? (1) A program s locality behavior is constant over the run of an entire

More information

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Processor Design Pipelined Processor (II) Hung-Wei Tseng Processor Design Pipelined Processor (II) Hung-Wei Tseng Recap: Pipelining Break up the logic with pipeline registers into pipeline stages Each pipeline registers is clocked Each pipeline stage takes one

More information

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution Important guidelines: Always state your assumptions and clearly explain your answers. Please upload your solution document

More information

4. (2 pts) What is the only valid and unimpeachable measure of performance?

4. (2 pts) What is the only valid and unimpeachable measure of performance? 1. (2 pts) What concept is at the heart of RISC processing? 2. (2 pts) What are the two main ways to define performance? 3. (3 pts) What is Amdahl s law, inwords? 4. (2 pts) What is the only valid and

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Very Simple MIPS Implementation

Very Simple MIPS Implementation 06 1 MIPS Pipelined Implementation 06 1 line: (In this set.) Unpipelined Implementation. (Diagram only.) Pipelined MIPS Implementations: Hardware, notation, hazards. Dependency Definitions. Hazards: Definitions,

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Simple Instruction Pipelining

Simple Instruction Pipelining Simple Instruction Pipelining Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Processor Performance Equation Time = Instructions * Cycles * Time Program Program Instruction

More information

Computer Architecture EE 4720 Midterm Examination

Computer Architecture EE 4720 Midterm Examination Name Solution Computer Architecture EE 4720 Midterm Examination 22 March 2000, 13:40 14:30 CST Alias Problem 1 Problem 2 Problem 3 Exam Total (35 pts) (20 pts) (45 pts) (100 pts) Good Luck! Problem 1:

More information

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This

More information

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently Pipelining Computational assembly line Each step does a small fraction of the job All steps ideally operate concurrently A form of vertical concurrency Stage/segment - responsible for 1 step 1 machine

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

Computer Systems and -architecture

Computer Systems and -architecture Computer Systems and -architecture Project 5: Datapath 1 Ba INF 2018-2019 Brent van Bladel brent.vanbladel@uantwerpen.be Don t hesitate to contact the teaching assistant of this course. M.G.305 or by e-mail.

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl. Lecture 4: Review of MIPS Instruction formats, impl. of control and datapath, pipelined impl. 1 MIPS Instruction Types Data transfer: Load and store Integer arithmetic/logic Floating point arithmetic Control

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

A Model RISC Processor. DLX Architecture

A Model RISC Processor. DLX Architecture DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register

More information

Vertieferlabor Mikroelektronik (85-324) & Embedded Processor Lab (85-546) Task 5

Vertieferlabor Mikroelektronik (85-324) & Embedded Processor Lab (85-546) Task 5 FB Elektrotechnik und Informationstechnik Prof. Dr.-Ing. Norbert Wehn Dozent: Uwe Wasenmüller Raum 12-213, wa@eit.uni-kl.de Task 5 Introduction Subject of the this task is the extension of the fundamental

More information

DLX: A Simplified RISC Model

DLX: A Simplified RISC Model DLX: A Simplified RISC Model 1 DLX Pipeline Fetch Decode Integer ALU Data Memory Access Write Back Memory Floating Point Unit (FPU) Data Memory IF ID EX MEM WB definition based on MIPS 2000 commercial

More information

Data paths for MIPS instructions

Data paths for MIPS instructions You are familiar with how MIPS programs step from one instruction to the next, and how branches can occur conditionally or unconditionally. We next examine the machine level representation of how MIPS

More information

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers? Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance

More information

CSE 378 Midterm Sample Solution 2/11/11

CSE 378 Midterm Sample Solution 2/11/11 Question 1. (9 points). Suppose we have the following MIPS instructions stored in memory starting at location 0x2400. 2400: add $v0,$t1,$t3 2404: loop: sll $t3,$a2,12 2408: or $s2,$t3,$a1 240c: bne $s2,$t8,loop

More information

CHAPTER 2: INSTRUCTION SET PRINCIPLES. Prepared by Mdm Rohaya binti Abu Hassan

CHAPTER 2: INSTRUCTION SET PRINCIPLES. Prepared by Mdm Rohaya binti Abu Hassan CHAPTER 2: INSTRUCTION SET PRINCIPLES Prepared by Mdm Rohaya binti Abu Hassan Chapter 2: Instruction Set Principles Instruction Set Architecture Classification of ISA/Types of machine Primary advantages

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Midterm. Sticker winners: if you got >= 50 / 67

Midterm. Sticker winners: if you got >= 50 / 67 CSC258 Week 8 Midterm Class average: 4.2 / 67 (6%) Highest mark: 64.5 / 67 Tests will be return in office hours. Make sure your midterm mark is correct on MarkUs Solution posted on the course website.

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-4 refer to a proposed MIPS instruction lwu (load word - update) which implements update addressing an addressing mode that is used in

More information

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts) Part I: Assembly and Machine Languages (22 pts) 1. Assume that assembly code for the following variable definitions has already been generated (and initialization of A and length). int powerof2; /* powerof2

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07 Pipelined CPUs Where are the registers? Study Chapter 6 of Text L19 Pipelined CPU I 1 Review of CPU Performance MIPS = Millions of Instructions/Second MIPS = Freq CPI Freq = Clock Frequency, MHz CPI =

More information

Improving Performance: Pipelining

Improving Performance: Pipelining Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31 4.16 Exercises 419 Exercise 4.11 In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Faculty of Science FINAL EXAMINATION

Faculty of Science FINAL EXAMINATION Faculty of Science FINAL EXAMINATION COMPUTER SCIENCE COMP 273 INTRODUCTION TO COMPUTER SYSTEMS Examiner: Prof. Michael Langer April 18, 2012 Associate Examiner: Mr. Joseph Vybihal 2 P.M. 5 P.M. STUDENT

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

CSEN 601: Computer System Architecture Summer 2014

CSEN 601: Computer System Architecture Summer 2014 CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single- Cycle CPU Datapath Design

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single- Cycle CPU Datapath Design CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single- Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia hgp://inst.eecs.berkeley.edu/~cs61c/sp13/ www.technologyreview.com/news/512891/software-makes-multiple-screens-less-distracting/

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

--------------------------------------------------------------------------------------------------------------------- 1. Objectives: Using the Logisim simulator Designing and testing a Pipelined 16-bit

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

CS433 Homework 3 (Chapter 3)

CS433 Homework 3 (Chapter 3) CS433 Homework 3 (Chapter 3) Assigned on 10/3/2017 Due in class on 10/17/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * SAMPLE 1 Section: Simple pipeline for integer operations For all following questions we assume that: a) Pipeline contains 5 stages: IF, ID, EX,

More information

Lecture 3: Single Cycle Microarchitecture. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 3: Single Cycle Microarchitecture. James C. Hoe Department of ECE Carnegie Mellon University 8 447 Lecture 3: Single Cycle Microarchitecture James C. Hoe Department of ECE Carnegie Mellon University 8 447 S8 L03 S, James C. Hoe, CMU/ECE/CALCM, 208 Your goal today Housekeeping first try at implementing

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CS Basic Pipeline

CS Basic Pipeline CS 3220 Basic Pipeline Why not go directly to five stages? This is what we had in CS 2200! Will have more stages in Project 3, but We want to start with something easier Lots of things become more complicated

More information

Digital Design Using Verilog and FPGAs An Experiment Manual. Chirag Sangani Abhishek Kasina

Digital Design Using Verilog and FPGAs An Experiment Manual. Chirag Sangani Abhishek Kasina Digital Design Using Verilog and FPGAs An Experiment Manual Chirag Sangani Abhishek Kasina ii Contents I Combinatorial and Sequential Circuits 1 1 Seven-Segment Decoder 3 1.1 Concept.........................................

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3

More information

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content 3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design

More information