EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04

Similar documents
EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

Chapter 5 Solutions: For More Practice

Processor: Multi- Cycle Datapath & Control

CPE 335. Basic MIPS Architecture Part II

Points available Your marks Total 100

CSE 2021 COMPUTER ORGANIZATION

Control Unit for Multiple Cycle Implementation

Major CPU Design Steps

RISC Processor Design

LECTURE 6. Multi-Cycle Datapath and Control

CSE 2021 COMPUTER ORGANIZATION

Topic #6. Processor Design

CC 311- Computer Architecture. The Processor - Control

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

ECE369. Chapter 5 ECE369

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

RISC Architecture: Multi-Cycle Implementation

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Processor (I) - datapath & control. Hwansoo Han

Mapping Control to Hardware

EECE 417 Computer Systems Architecture

The overall datapath for RT, lw,sw beq instrucution

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS232 Final Exam May 5, 2001

RISC Architecture: Multi-Cycle Implementation

Chapter 5: The Processor: Datapath and Control

Systems Architecture I

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Using a Hardware Description Language to Design and Simulate a Processor 5.8

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Chapter 4. The Processor

Design of the MIPS Processor

Laboratory 5 Processor Datapath

Multiple Cycle Data Path

RISC Design: Multi-Cycle Implementation

ECE 30, Lab #8 Spring 2014

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

Chapter 4. The Processor

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

The Processor: Datapath & Control

ENE 334 Microprocessors

CPU Design Steps. EECC550 - Shaaban

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Lecture 5: The Processor

ECE232: Hardware Organization and Design

Processor (multi-cycle)

Final Project: MIPS-like Microprocessor

CS232 Final Exam May 5, 2001

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Systems Architecture

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CSEN 601: Computer System Architecture Summer 2014

Design of the MIPS Processor (contd)

Multicycle conclusion

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Lets Build a Processor

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Single Cycle Data Path

EE 357 Project Multicycle CPU 1

Chapter 4 The Processor (Part 2)

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

Multicycle Approach. Designing MIPS Processor

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Winter 2002 FINAL EXAMINATION

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Microprogramming. Microprogramming

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Grading Results Total 100

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS61C : Machine Structures

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

Transcription:

EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04 1 [Based on Question #7 of Summer 1993 Midterm] Remove TARGET register, add ZERO FF: Please refer to figures 5.39 (page 323) and 5.47 (page 332) in your book (1st edition). To reduce the cost of hardware even at the loss of some performance, the target register has been removed, the mux following the target register has been converted to a 2-to-1 mux as shown below and a status flip-flop has been added to capture the ZERO output from the ALU. Note: To distinguish from another question (#6) in this assignment (Q#7 of Summer 1996 Midterm), here you are NOT allowed to take the ZERO output of the ALU to the control unit as an input. Z_FF_WRITE Q_ZERO ZERO 0 1 1 D Q ZERO FF CLK Z_FF_WRITE EE457 Lab 4 Part 4 1 / 23

The consequent modifications to the state diagram of fig. 5.47 is only partly carried out below. Please carry out the state diagram modification completely to handle the beq instruction. Add additional states as needed. Activate Z_FF_WRITE control signal in an appropriate state. Revise the PCSource signal values in view of the fact that it is a 1-bit control line now. DUMMY STATE IorD=1 Your TA asks you to think critically and decide whether is possible to avoid using the Z_FF_WRITE control signal and let the ZERO_FF write at the end of every clock (like the ALUout and MDR registers of the 2 nd /3 rd edition design). Your answer: EE457 Lab 4 Part 4 2 / 23

2 [Based on Question #4 of Summer 2002 Midterm] New Register file with one READ PORT: Please consider a modified register file which has only 1 read port. It requires you to read one register at a time. 2.1 Scenario 1 : Given on the next page is the completely modified datapath. Given below is a modified incomplete state diagram. We assume that the clock has extra slack to account for the reading time of the other register and immediately using it as an operand for ALU operation. Please complete the incomplete state diagram. Simply fill the values for rs/rt and R_Write. Use "X" for "don t care" whenever possible to allow logic minimization. Scenario 1 State Diagram 1 IorD=1 EE457 Lab 4 Part 4 3 / 23

Scenario 1 Datapath (Complete) R_WRITE RW rs/rt RD RA Reg_File WA WD rs rt rd TEMP REG 0 1 EE457 Lab 4 Part 4 4 / 23

2.2 Scenario 2 : In a more realistic design, we do not expect so much extra slack in clock. So perhaps an extra state called state 1a (after state 1) can be included to allow fetching of the other register. Please read the discussion below. Miss Bruin : Every instruction should pass through state 1a Mr. Bruin : Not all instructions need two source registers. For example, jump does not need any source registers. sw needs 2 source registers but lw needs only one source register. Miss Trojan: Well sw does not need the second source register until state 5 where memory is written. Activity in state 2 (effective address computation) does not depend on the second register. So let us combine state 2 and the proposed state 1a into the new state 2n (2 new). Mr. Trojan : Do you want an extra state for jump also? Show your modified state diagram for scenario 2 of this question to suit the modified datapath of the 1st edition (scenario 2) of this question. Note: Here we use the temp register to capture (rs) (not rt). 2.3 In scenario 1, we have provided a temporary register to capture contents of "rt" where as in scenario 2, we have provided a temporary register to capture contents of "rs". Is this option to capture (rs) in temporary register rather than (rt) an important choice to this (scenario 2) design or is it an insignificant random choice? Explain. EE457 Lab 4 Part 4 5 / 23

Scenario 2 - DATAPATH (Complete) R_WRITE rs/rt TEMP REG REG RW RD RA Reg_File WA WD rs rt rd 0 1 EE457 Lab 4 Part 4 6 / 23

Complete the state diagram. When you fill the values for rs/rt and R_Write use "X" for "don t care" whenever possible to allow logic minimization. Scenario 2 - STATE DIAGRAM 1 State to read the other register IorD=1 RA1/RA2 = ALUSelA= 1 ALUSelB= 00 ALUOp = 10 RA1/RA2 = ALUSelA= 1 ALUSelB= 00 ALUOp= 01 PCWriteCond PCSource= 01 RA1/RA2 = PCWrite PCSource= 10 EE457 Lab 4 Part 4 7 / 23

3 [Question #2 of Spring 1994 Midterm] JR rs implementation: Given below is the JR rs (jump register) instruction format. It causes an unconditional jump to the address contained in rs register. The RTL (Register Transfer Language) statement for the same is (PC) <== (rs) Also given below is the instruction format for ADD rd, rs, rt instruction. SPECIAL 31 26 SPECIAL 31 26 Table 1: Jump Register Format rs 25 21 0 20 6 JR 5 0 000000 rs 000000000000000000 001000 rs 25 21 Table 2: ADD Format rt 20 16 rd 15 11 0 10 6 ADD 5 0 000000 rs rt rd 00000 100000 Reproduced on the next page is the datapath for the multicycle CPU implementation from your textbook (Fig. 5.39/P323-1st edition) modified to include the function-filed connection to the control unit. Please note that the control unit shall now distinguish JR instruction from R-Type instructions by looking at the function-filed. 3.1 Show how the JR rs instruction can be implemented without any modification to the datapath (except for the function-filed connection to the control unit). Hint: How about (PC) <== (rs) + ($0)? In the case of the JR rs instruction format, what do you have in the place of rt field position of the ADD instruction? Does it help? Note: If you can not think of a way to implement JR rs instruction without any modification to the datapath, then you are free to suggest and show the modifications to datapath desired by you. Briefly explain your modifications to the datapath only if you modify the datapath. EE457 Lab 4 Part 4 8 / 23

function-filed EE457 Lab 4 Part 4 9 / 23

3.2 Complete the modification to the state diagram for the control unit of the multicycle CPU (Fig. 5.47/P332 of the 1st ed. reproduced below) to accommodate JR rs instruction. Note: Your modification to the state diagram will be judged based on your datapath. You need to decide if the ALU Control block needs to be modified or not and explain why it needs to be modified if it needs to be modified or why it does not need to be modified. If you need to modify it, you do not need to show the details of the proposed modification. Simply narrate the proposed modification in words. OpCode = 000000, ff =/= 001000 IorD=1 OpCode = 000000, ff == 001000 10 JR completion EE457 Lab 4 Part 4 10 / 23

4 [Question #2 of Summer 1995 midterm] SWAP instruction implementation: In this question we are trying to implement the SWAP instruction on the multi-cycle CPU. Instead of using one of the general purpose registers in the Register File as a temporary register, Mr. Trojan suggests that the TARGET register is used as a temporary register. Though the TARGET register holds the Branch Target Address by the end of the state 1, once you decode and recognize that the current instruction is a SWAP instruction, you may safely overwrite the contents of the target register. Proposed format of the SWAP instruction: Note that the rs field is repeated twice in the format. opcode swap rs rt rs xxxxx xxxxxx Bits 31 26 25 21 20 16 15 11 10 6 5 0 4.1 A modified data path is provided in Fig 4.1. Please notice the muxes I, II, and III. State the use/ purpose of these muxes so far as the execution of the SWAP instruction is concerned. 4.2 Add a set of states to the state diagram in Fig. 4.2 for the execution of the SWAP instruction. Label the states as State 10, 11,... etc. State what is achieved in each of the states in the space provided below. You do NOT need to find the actual values of the control signals. It is enough EE457 Lab 4 Part 4 11 / 23

to state something like, "access ($rs) and ($0) from the register file", "compute ($rs) + ($0) using ALU", etc. Note that, in the original implementation of an R-type instruction, we used state 1 to access registers, state 6 to perform the operation by the ALU, and state 7 to place the result in the destination register. We did not perform all the three operations in one clock cycle/state as one clock would not be long enough at the high frequency of our CPU clock. Please follow the same philosophy here also. State 1: Decode and identify the swap instruction State10: State 11: State 12: State 13: State 14: State : 4.3 Consider the SWAP instruction and the single-cycle CPU of chapter 5. Explain why it is not possible to implement the swap instruction without a modification of the Register file. EE457 Lab 4 Part 4 12 / 23

0 1 2 III SR1 0 00000 II I 1 SR2 0 1 Fig 4.1 EE457 Lab 4 Part 4 13 / 23

IorD=1 (Op= SWAP ) 10 10 11 11 12 12 12 13 Add more states if required 13 13 14 EE457 Lab 4 Part 4 14 / 23

5 [Based on Question 4 from Summer 2004 Midterm] Slow ALU needs two clocks: Reproduced below is the 10-state state diagram from your textbook (1st edition). IorD=1 Assume that the ALU is SLOW and needs two clock cycles to produce its result. Accordingly on the next page, we have added extra states called 0A, 1A, 2A, 6A, and 8A after states 0, 1, 2, 6, 8. Instruction decoding can now be done in 0A state. So the anticipatory BTA (Branch Target Address) calculation is no more anticipatory. It is done only if the instruction is a BEQ instruction. Complete the state diagram on the next page by completing the states 0A, 1A, 2A, 6A, and 8A. You can write in the state circles, "same as in state ABC", "all signals of state XYZ except signal PQR", etc. If you need to modify any of the original states (0, 1, 2, 3, 4, 5, 6, 8, 9 ), feel free to do so. EE457 Lab 4 Part 4 15 / 23

0A Instruction Decode Register Fetch PC incre. completion TargetWrite (Op= LW ) or (Op= SW ) (Op=R-type) 1A 2A 6A 8A ALUSrcA=1 ALUSrcA=1 ALUSrcA=1 ALUSrcB=10 ALUOp=00 ALUSrcB=10 ALUOp=00 ALUSrcB=00 ALUOp=10 MemRead ALUSrcA=1 IorD=1 ALUSrcB=10 ALUOp=00 EE457 Lab 4 Part 4 16 / 23

5.1 Complete the "Clocks Taken" columns in the table on the side. 5.2 Target Write Control signal: Can we remove the write control signal on the Target register and write to it at the end of every clock? Answer for each of the two cases separately. Provide brief explanation. (a) in the case of the original design Instruction LW SW R-type BEQ Jump Clocks Taken In the original design In the new design (b) in the case of this revised design. _ 5.3 Now suggest optimizing the BEQ execution. EE457 Lab 4 Part 4 17 / 23

6 [Based on Question #7 from Summer 1996 Midterm] Anticipatory "EA" calculation for lw/sw in state 1 instead of anticipatory "BTA" calculation for beq in state 1: The CPIs under the current CPI column are based on our current multi-cycle CPU implementation. Mr. Trojan noted that, gaining one clock in load/store instruction is worth even at the cost of losing one clock in branch instructions as load/store occur more often compared to branch. He recommended not to perform the anticipatory calculation of BRANCH ADDRESS (and storing it in the TARGET register) in state 1. Instead, he wanted that we compute the memory address (normally done in state 2) early in state 1 thereby gaining a clock in load/store execution. To complete branch instruction, you need an extra state to calculate the target address. 6.1 Implementation of Trojan s design: Instruction i Frequency fi Current CPIi New CPIi Load 20% 5 Store 10% 4 R-type 50% 4 Branch 15% 3 Jump 5% 3 The datapath on the next page has been revised. The TARGET register has been removed, hence the 3-to-1 mux after the target register was reduced to a 2-to-1 mux. The ZERO inference from the ALU has been conveyed to the control unit and the PCWrite control logic has been modified. Complete the changes to the state diagram. State 2 has been cancelled. New states, 1 (New) and 8 (New), were added in the place of the original states 1 and 8. An additional state 8A has been added. Fill up the control signals as appropriate to these new states 1 (New), 8 (New), and 8A. 6.2 Is it possible that the apparent advantage in reducing the number of clocks needed to perform lw and sw instructions is offset by need for slowing the clock? Did we pack too much stuff in state 1? Can the memory address computation start at the beginning of state 1 or does it need to wait for anything? Explain. 6.3 Assuming that Mr. Trojan s design finally works out without having to slow down the clock, what is the factor of improvement/speed-up it achieves? Fill-up the new CPI column and calculate. If needed, assume that 50% of the branches are taken branches (successful branches). EE457 Lab 4 Part 4 18 / 23

No PCWriteCond signal anymore ZERO 1 EE457 Lab 4 Part 4 19 / 23

1 (NEW) IorD=1 8A ZERO 8 (NEW) ZERO EE457 Lab 4 Part 4 20 / 23

7 [Based on Question #2 Summer 2003 Final Exam]: Separate Data Memory: Reproduced on the next two pages are the datapath and state diagram from the first edition of the text book used in your current lab #4 design. You are aware that the so called memory is actually an on-chip cache for high frequency operation. Your boss wants to experiment with different parameters for the instruction cache as compared to the data cache. For example, he wants to use direct mapping for the instruction cache whereas set-associative mapping for the data cache. So your boss asked you to replace the single unified memory in the datapath with separate instruction memory and data memory. Please modify both the datapath and the state diagram as needed. If you wish to use two sets of MemRead and MemWrite control signals use signal names with suffix "_I" for instruction memory and suffix "_D" for data memory (MemRead_I, MemWrite_I, and MemRead_D, MemWrite_D). Your assistants, Bruin #1 and Bruin #2, are arguing about the need for IR (Instruction Register) in the modified datapath and they are "duly" confused! You decided to (remove/retain) IR because EE457 Lab 4 Part 4 21 / 23

To be integrated by you MW Address Read Data Data Memory Write Data MR EE457 Lab 4 Part 4 22 / 23

IorD=1 EE457 Lab 4 Part 4 23 / 23