CSEN 601: Computer System Architecture Summer 2014

Similar documents
Processor (I) - datapath & control. Hwansoo Han

LECTURE 5. Single-Cycle Datapath and Control

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 5 Solutions: For More Practice

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS 351 Exam 2 Mon. 11/2/2015

Systems Architecture

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Processor Design Pipelined Processor (II) Hung-Wei Tseng

CC 311- Computer Architecture. The Processor - Control

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The MIPS Processor Datapath

Computer Architecture, IFE CS and T&CS, 4 th sem. Single-Cycle Architecture

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

CPE 335 Computer Organization. Basic MIPS Architecture Part I

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CSE 2021 COMPUTER ORGANIZATION

CPU Organization (Design)

COMP2611: Computer Organization. The Pipelined Processor

CPE 335. Basic MIPS Architecture Part II

Processor: Multi- Cycle Datapath & Control

ECE232: Hardware Organization and Design

CENG 3420 Lecture 06: Datapath

Design of the MIPS Processor

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

Single Cycle CPU Design. Mehran Rezaei

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Computer Organization and Structure

MIPS-Lite Single-Cycle Control

Single Cycle Data Path

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

RISC Processor Design

Topic #6. Processor Design

CS 351 Exam 2, Fall 2012

Problem 4 The order, from easiest to hardest, is Bit Equal, Split Register, Replace Under Mask.

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor: Datapath & Control

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

For More Practice FMP

ECE Sample Final Examination

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

Design of the MIPS Processor (contd)

Chapter 5: The Processor: Datapath and Control

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

ECS 154B Computer Architecture II Spring 2009

Laboratory Exercise 6 Pipelined Processors 0.0

ECE Exam I - Solutions February 19 th, :00 pm 4:25pm

King Fahd University of Petroleum and Minerals College of Computer Science and Engineering Computer Engineering Department

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Laboratory 5 Processor Datapath

Chapter 4. The Processor

CS 2506 Computer Organization II Test 1. Do not start the test until instructed to do so! printed

CSE 378 Midterm 2/12/10 Sample Solution

NATIONAL UNIVERSITY OF SINGAPORE

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Chapter 4. The Processor

CSE Lecture In Class Example Handout

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

CS Computer Architecture Spring Week 10: Chapter

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

The Evolution of Microprocessors. Per Stenström

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Project Description EEC 483 Computer Organization, Spring 2019

Machine Organization & Assembly Language

COMP303 Computer Architecture Lecture 9. Single Cycle Control

Review: Abstract Implementation View

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Major CPU Design Steps

LECTURE 6. Multi-Cycle Datapath and Control

University of Jordan Computer Engineering Department CPE439: Computer Design Lab

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

The overall datapath for RT, lw,sw beq instrucution

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

CSc 256 Midterm 2 Fall 2011

ENE 334 Microprocessors

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Chapter 4 The Processor 1. Chapter 4A. The Processor

A Simplified MIPS Processor in Verilog

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Transcription:

CSEN 601: Computer System Architecture Summer 2014 Practice Assignment 5 Solutions Exercise 5-1: (Midterm Spring 2013) a. What are the values of the control signals (except ALUOp) for each of the following instructions: add, addi, lw, sw, beq? RegDest Branch MemRead MemtoReg MemWrite ALUSrc RegWrite add addi lw sw beq b. Assuming the latencies shown in the table below, what will be the latency of each of the instructions discussed in part a? What is the maximum clock speed? I- Mem Adder Mux ALU Regs D- Mem Sign- extend Shift- left- 2 600ps 100ps 50ps 150ps 185ps 500ps 10ps 5ps c. Can we replace the adders by cheaper (slower) ones without affecting the clock speed of the processor? In case your answer is no, explain why this is the case. In case your answer is yes, compute the maximum latency for the adder that does not affect the clock speed? a. Control Signals: RegDest Branch MemRead MemtoReg MemWrite ALUSrc RegWrite add 1 0 0 0 0 0 1 addi 0 0 0 0 0 1 1 lw 0 0 1 1 0 1 1 sw X 0 0 X 1 1 0 beq X 1 0 X 0 0 0 b. Latency of instructions: add latency = 600 + 185 + 50 + 150 + 50 = 1035 ps addi latency = 600 + 185 + 150 + 50 = 985 ps lw latency = 600 + 185 + 150 + 500 + 50 = 1485 ps sw latency = 600 + 185 + 150 + 500 = 1435 ps beq latency = 600 + 185 + 50 + 150 +50 = 1035 ps The maximum clock speed is a 1/1485x10-12 = 673.4 MHz 1

c. There are two adders: one to compute the PC+4 and the other one to compute the branch address. Both are not on the critical path of the instructions and can be replaced by slower adders. Note that the second adder exists on 2 non- critical paths. If we call the adder latency X, we have the following constraints Path1 latency: 2 X + 50 1485 implying that X 717.5 ps Path2 latency: 600 + 10 + 5 + X + 50 1485 implying that X 820 ps Therefore the maximum adder latency of the adders that will not affect the processor speed is 717.5 ps. Exercise 5-2: In the basic single- cycle implementation, different instructions utilize different hardware blocks. For each of the following instructions: 1. add rd, rs, rt 2. lw rt, offset(rs) a. What are the values of control signals generated by the control in Figure 1? b. Which resources (blocks) perform a useful function for each instruction? Which resources (blocks) produce outputs, but these outputs are not used for the instruction? Which resources produce no outputs for the instruction? a. The values of the control signals are as follows: RegWrite MemRead ALUSrc MemWrite MemtoReg Branch 1. 1 0 0(Reg) 0 0(ALU) 0 2. 1 1 1(lmm) 0 1(Mem) 0 ALUSrc is the control signal that controls the Mux at the ALU input, 0 (Reg) selects the output of the register file and 1 (Imm) selects the immediate from the instruction word as the second input to the ALU. MemtoReg is the control signal that controls the Mux at the Data input to the register file, 0 (ALU) selects the output of the ALU and 1 (Mem) selects the output of memory. A value of X is a don t care (does not matter if the signal is 0 or 1). b. Resources performing a useful function for the instruction are: 1. All except Data Memory and Branch Add unit 2. All except Branch Add unit and second read port of the Registers The Resources are as follows: Outputs that are not used No outputs 1. Branch Add Data Memory 2. Branch Add, second read port of Registers None (all units produce outputs) Exercise 5-3: 2

In the basic single- cycle implementation, the latencies of individual components of the data path affect the clock cycle time of the entire data path. Thus, for the following latencies: a. What is the clock cycle time if the only type of instructions we need to support are ALU instructions (add, and, etc )? b. What is the clock cycle time if we only had to support lw instructions? c. What is the clock cycle time if we need to support add, beq, lw, and sw instructions? d. If we can improve the latency of one of the given data path components by 10%, which component should it be? What is the speed- up from this improvement? I- Mem Add Mux ALU Regs D- Mem Sign- extend Shift- left- 2 1. 400ps 100ps 30ps 120ps 200ps 350ps 20ps 0ps 2. 500ps 150ps 100ps 180ps 220ps 1000ps 90ps 20ps a. The longest- latency path for ALU operations (add, sub) is through I- Mem, Regs, Mux (to select ALU operand), ALU, and Mux (to select value for register write). Note that the only other path of interest is the PC- increment path through Add (PC+4), Mux and Mux, which is much shorter. So for the I- Mem, Regs, Mux (to select ALU input), ALU, Mux path the latencies are: 1. 400ps+200ps+30ps+120ps+30ps=780ps 2. 500ps+220ps+100ps+180ps+100ps=1100ps b. The longest- latency path for lw is through I- Mem, Regs, ALU, D- Dem, and Mux (to select what is written to register). The only other interesting paths are the PC- increment path (which is much shorter) and the path through Sign- extend unit in address computation. So for I- Mem, Regs, ALU, D- Mem, and Mux path, the latencies are: 1. 400ps+200ps+120ps+350ps+30ps=1100ps 2. 500ps+220ps+180ps+1000ps+100ps=2000ps c. In case of add, beq, lw, and sw instructions, the answer is the same as the above instructions, because the lw instruction has the longest critical path. The longest path for sw is shorter by one Mux latency (no write to register), and the longest path for add or beq is shorter by one D- Mem latency and one MUX latency. d. The clock cycle time is determined by the critical path for the instruction that has the longest critical path. This is the lw instruction, and its critical path goes through I- Mem, Regs, Mux, ALU, D- Mem, and Mux. 1. I- Mem has the longest latency, so we can reduce its latency from the 400ps to 360ps, making the clock cycle 40ps shorter. The speed- up achieved by reducing the clock cycle time is 1100ps/1060ps=1.037 3

2. D- Mem has the longest latency, so we can reduce its latency from 1000ps to 900ps, making the clock cycle 100ps shorter. The speed- up achieved by reducing the clock cycle time is then 2000ps/1900ps=1.050 Exercise 5-4: a. For the following logic blocks latencies, if the only thing we need to do in a processor is fetch consecutive instructions as shown in Figure 2, what would be the cycle time? b. Consider a data path similar to the one in Figure 1, but for a processor that only has one type of instruction: unconditional PC- relative branch. What would the cycle time be for this data path? c. What if conditional PC- relative branch is the only instruction supported? I- Mem Add Mux Alu Regs D- Mem Sign- extend Shift- left- 2 1. 400ps 100ps 30ps 120ps 200ps 350ps 20ps 2ps 2. 500ps 150ps 100ps 180ps 220ps 1000ps 90ps 20ps a. If the processor only does the fetch instruction; then the resources used are the I- Mem and the Add unit. I- Mem takes longer than the Add unit, so the clock cycle time is equal to the latency of the I- Mem: 1. 400ps 2. 500ps b. The path for the unconditional PC- relative branch instruction is through the instruction memory, Sign- extend and Shift- left- 2 to get the offset, Add unit to compute the new PC, and Mux to select that value instead of PC+4. Note that the path through the other Add unit is shorter, because the latency of I- Mem is longer that the latency of the Add unit. Thus, the cycle time will be as follows: 1. 400ps+20ps+2ps+100ps+30ps = 552ps 2. 500ps+90ps+20ps+150ps+100ps = 860ps c. Conditional branches have the same long- latency path that computes the branch address as unconditional branches. Additionally, they have a long- latency path that goes through Registers, Mux, and ALU to compute the PCSrc condition. The critical path is the longer of the two, and the path through PCSrc is longer for these latencies: 1. 400ps+200ps+30ps+120ps+30ps = 780ps 2. 500ps+220ps+100ps+180ps+100ps = 1100ps 4

Exercise 5-5: a. Which kind of instructions requires each of the following resources? b. Assuming that we only support bne and add instructions, discuss how changes in the latencies of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change. 1. Add 4 adder (The adder adding 4 to the PC) 2. Data Memory a. The instructions that require the above resources: 1. All instructions except jumps that are not PC- relative (jal, j, jr). 2. Loads and stores. b. Supporting the two instructions (beq and add), add has a longer critical path so it determines the clock cycle time. As a result, we focus on how the unit s latency affects the critical path of add: 1. As the critical path is the one going through the I- Mem and the ALU to compare the first two arguments of the add instruction, this unit is not on the critical path, so changes to its latency do not affect the clock cycle time unless the latency of the unit becomes so large to create a new critical path through this unit, the branch add, and the PC Mux. 2. This unit is not used by beq nor add instructions, so it does not affect the critical path for either instruction. Exercise 5-6: What is the value of the following control signals for the jump instruction (Figure 3)? a. RegDst b. MemRead c. RegWrite d. Jump e. Branch The values of the control signals are: a. RegDst = X b. MemRead = 0 c. RegWrite = 0 d. Jump = 1 e. Branch = X 5

Exercise 5-7: For the following single- cycle data path MIPS instruction, what is the value of the instruction word? What is the register number supplied to the register file s Read register 1 input? Is this register actually read? How about Read register 2? How about Write register? Instruction 1. lw $1, 40($6) 2. Label: bne $1, $2, Label The value of the instruction word: Binary Hexadecimal 1. 100011 00110 00001 0000000000101000 8CC10028 2. 000101 00001 00010 1111111111111111 1422FFFF For read register 1 and read register 2, the registers numbers are as follows: Read register 1 Actually read? Read register 2 Actually read? a. 6 (00110) Yes 1 (00001) Yes (but not used) b. 1 (00001) Yes 2 (00010) Yes For write register : Write register Register actually written? a. 1 (00001) Yes b. Either 2 (00010) or 31 (11111) (RegDst is X) No 6

Figures: 3. Figure 1 Figure 2 7

Figure 3 8