CS232 Final Exam May 5, 2001

Similar documents
CS232 Final Exam May 5, 2001

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Computer Architecture CS372 Exam 3

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Chapter 5 Solutions: For More Practice

CSE 2021 COMPUTER ORGANIZATION

ECE369. Chapter 5 ECE369

RISC Processor Design

LECTURE 6. Multi-Cycle Datapath and Control

Points available Your marks Total 100

Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm

Processor: Multi- Cycle Datapath & Control

CPE 335. Basic MIPS Architecture Part II

COMPUTER ORGANIZATION AND DESIGN

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Chapter 4. The Processor

CC 311- Computer Architecture. The Processor - Control

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

CoE - ECE 0142 Computer Organization. Instructions: Language of the Computer

CS3350B Computer Architecture Quiz 3 March 15, 2018

Systems Architecture I

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CS 351 Exam 2 Mon. 11/2/2015

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Chapter 4 The Processor 1. Chapter 4A. The Processor

Processor (I) - datapath & control. Hwansoo Han

ENE 334 Microprocessors

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE Sample Final Examination

COMPUTER ORGANIZATION AND DESIGN

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CSE 2021 COMPUTER ORGANIZATION

ECS 154B Computer Architecture II Spring 2009

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Systems Architecture

Topic #6. Processor Design

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 5: The Processor: Datapath and Control

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

CS 251, Winter 2018, Assignment % of course mark

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECE331: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Processor (multi-cycle)

Winter 2002 FINAL EXAMINATION

Final Exam Spring 2017

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

The Processor: Datapath & Control

Multiple Cycle Data Path

RISC Architecture: Multi-Cycle Implementation

RISC Architecture: Multi-Cycle Implementation

ECE260: Fundamentals of Computer Engineering

Multicycle conclusion

Microprogramming. Microprogramming

ECE Exam I - Solutions February 19 th, :00 pm 4:25pm

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CS 251, Winter 2019, Assignment % of course mark

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CSEN 601: Computer System Architecture Summer 2014

Major CPU Design Steps

CS/COE1541: Introduction to Computer Architecture

LECTURE 3: THE PROCESSOR

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

Processor Design Pipelined Processor (II) Hung-Wei Tseng

COMP2611: Computer Organization. The Pipelined Processor

comp 180 Lecture 10 Outline of Lecture Procedure calls Saving and restoring registers Summary of MIPS instructions

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

LECTURE 5. Single-Cycle Datapath and Control

ECE Exam I February 19 th, :00 pm 4:25pm

ECE232: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor

Full Datapath. Chapter 4 The Processor 2

RISC Design: Multi-Cycle Implementation

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Single vs. Multi-cycle Implementation

Final Project: MIPS-like Microprocessor

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Processor (II) - pipelining. Hwansoo Han

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Instructions: Language of the Computer

EE457 Lab 4 Part 4 Seven Questions From Previous Midterm Exams and Final Exams ee457_lab4_part4.fm 10/6/04

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

CS 351 Exam 2, Fall 2012

Transcription:

CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State any assumptions you make. No written references or calculators are allowed. Question Maximum Your Score 25 2 25 3 25 4 25 5 3 6 2 Total 5

MIPS Instructions These are some of the most common MIPS instructions and pseudo-instructions, and should be all you need. However, you are free to use any valid MIPS instructions or pseudo-instruction in your programs. Category Example Meaning add rd, rs, rt rd = rs + rt sub rd, rs, rt rd = rs - rt Arithmetic addi rd, rs, const rd = rs + const mul rd, rs, rt rd = rs * rt Data Transfer Branch Set Jump move rd, rs rd = rs li rd, const rd = const lw rt, const(rs) rt = Mem[const + rs] (one word) sw rt, const (rs) Mem[const + rs] = rt (one word) lb rt, const (rs) rt = Mem[const + rs] (one byte) sb rt, const (rs) Mem[const + rs] = rt (one byte) beq rs, rt, Label if (rs == rt) go to Label bne rs, rt, Label if (rs!= rt) go to Label bge rs, rt, Label if (rs >= rt) go to Label bgt rs, rt, Label if (rs > rt) go to Label ble rs, rt, Label if (rs <= rt) go to Label blt rs, rt, Label if (rs < rt) go to Label slt rd, rs, rt if (rs < rt) then rd = ; else rd = slti rd, rs, const if (rs < const) then rd = ; else rd = j Label go to Label jr $ra gotoaddressin$ra jal Label $ra = PC + 4; go to Label The second source operand of sub, mul, and all the branch instructions may be a constant. Register Conventions The caller is responsible for saving any of the following registers that it needs, before invoking a function: $t-$t9 $a-$a3 $v-$v The callee is responsible for saving and restoring any of the following registers that it uses: $s-$s7 $ra Performance Formula for computing the CPU time of a program P running on a machine X: CPU time X,P = Number of instructions executed P CPI X,P Clock cycle time X CPI is the average number of clock cycles per instruction, or: CPI = Number of cycles needed / Number of instructions executed 2

Question : MIPS programming The goal of this problem is to write a MIPS function flipimage which flips an image horizontally. For example, a simple image is shown on the left, and its flip is shown on the right. A picture is composed of individual dots, or pixels, each of which will be represented by a single byte. The entire two-dimensional image is then stored in memory row by row. For example, we can store a 4 x 6 picture in memory starting at address as follows: The first row (consisting of 6 pixels) is stored at addresses -5. The second row is stored at addresses 6-. The third row is stored at 2-7 The last row is stored at addresses 8-23. Part (a) Write a MIPS function fliprow to flip a single rowofpixels.thefunctionhastwoarguments,passedin$a and $a: the address of the row and the number of pixels in that row. There is no return value. Be sure to follow all MIPS calling conventions. ( points) 3

Question continued Part (b) Using the fliprow function, you should now be able to write flipimage. The arguments will be: The memory address where the image is stored The number of rows in the image The number of columns in the image Again, there is no return value, and you should follow normal MIPS calling conventions. (5 points) 4

Question 2: Multicycle CPU implementation MIPS is a register-register architecture, where arithmetic source and destinations must be registers. But let s say we wanted to add a register-memory instruction: addm rd, rs, rt # rd = rs + Mem[rt] Here is the instruction format, for your reference (shamt and func are not used): Field op rs rt rd shamt func Bits 3-26 25-2 2-6 5- -6 5- The multicycle datapath from lecture is shown below. You may assume that ALUOp = performs an addition. Part (a) On the next page, show what changes are needed to support addm in the multicycle datapath. ( points) 5

6 Question 2 continued Result Zero ALU ALUOp M u x ALUSrcA 2 3 ALUSrcB Read register Read register 2 Write register Write data Read data 2 Read data Registers RegWrite Address Memory Mem Data Write data Sign extend Shift left 2 M u x PCSource PC A B ALU Out 4 [3-26] [25-2] [2-6] [5-] [5-] Instruction register Memory data register IRWrite M u x RegDst M u x MemToReg M u x IorD MemRead MemWrite PCWrite

Question 2 continued Part (b) Complete this finite state machine diagram for the addm instruction. Be sure to include any new control signals you may have added. (5 points) Instruction fetch and PC increment IorD = MemRead = IRWrite = ALUSrcA = ALUSrcB = ALUOp = PCSource = PCWrite = Register fetch and branch computation ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = R-type Branch completion ALUSrcA = ALUSrcB = ALUOp = PCSource = PCWrite = Zero ALUSrcA = ALUSrcB = ALUOp = func R-type execution Writeback RegDst = MemToReg = RegWrite = Op = LW/SW Effective address computation ALUSrcA = ALUSrcB = ALUOp = Op = SW Memory write IorD = MemWrite = Op = LW IorD = MemRead = Memory read lw register write RegDst = MemToReg = RegWrite = 7

Question 3: Pipelining and forwarding The next page shows a diagram of the pipelined datapath with forwarding, but no hazard detection unit. Part (a) Both of the code fragments below have dependencies, but only one of them will execute correctly with the forwarding datapath. Tell us which one, and why. If you like, you can draw a pipeline diagram to aid in your explanation. (5 points) add $t, $a, $a lw $t, ($a) add $v, $t, $t add $v, $t, $t Part (b) Here is one more code fragment. How is this one different from the previous ones? (5 points) lw $t, ($a) sw $t, ($a) Part (c) It is possible to modify the datapath so the code in Part (c) executes without any stalls. Explain how this could be done, and show your changes on the next page. (5 points) 8

Question 3 continued PC Instruction memory IF/ID ID/EX EX/MEM MEM/WB Registers 2 2 ForwardA ALU Data memory Rt Rd Rs ForwardB ID/EX. RegisterRt EX/MEM.RegisterRd ID/EX. RegisterRs Forwarding Unit MEM/WB.RegisterRd 9

Question 4: Pipelining and performance Here is the toupper example function from lecture. It converts any lowercase characters (with ASCII codes between 97 and 22) in the null-terminated argument string to uppercase. toupper: lb $t2, ($a) beq $t2, $, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 22, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, ($a) # Store back in string next: addi $a, $a, j toupper exit: jr $ra Assume that this function is called with a string that contains exactly lowercase letters, followed by the null terminator. Part (a) How many instructions would be executed for this function call? (5 points) Part (b) Assume that we implement a single-cycle processor, with a cycle time of 8ns. How much time would be needed to execute the function call? What is the CPI? (5 points)

Question 4 continued Part (c) Now assume that our processor uses a 5-stage pipeline, with the following characteristics: Each stage takes one clock cycle. The register file can be read and written in the same cycle. Assume forwarding is done whenever possible, and stalls are inserted otherwise. Branches are resolved in the ID stage and are predicted correctly 9% of the time. Jump instructions are fully pipelined, so no stalls or flushes are needed. How many total cycles are needed for the call to toupper with these assumptions? ( points) Part (d) If the cycle time of the pipelined machine is 2ns, how would its performance compare to that of the singlecycle processor from Part (b)? (5 points)

Question 5: Cache computations AMAT = Hit time + (Miss rate Miss penalty) Memory stall cycles = Memory accesses miss rate miss penalty The Junkium processor has a 6KB, 4-way set-associative (i.e., each set consists of 4 blocks) data cache with 32-byte blocks. Here, a KB is 2 bytes. Part (a) How many total blocks are in the Level cache? How many sets are there? (5 points) Part (b) Assuming that memory is byte addressable and addresses are 35-bits long, give the number of bits required for each of the following fields: (5 points) Tag Set index Block offset Part (c) What is the total size of the cache, including the valid, tag and data fields? Give an exact answer, in either bits or bytes. (5 points) 2

Question 5 continued Assume that the Junkium cache communicates with main memory via a 64-bit bus that can perform one transfer every cycles. Main memory itself is 64-bits wide and has a -cycle access time. Memory accesses and bus transfers may be overlapped. Part (d) What is the miss penalty for the cache? In other words, how long does it take to send a request to main memory and to receive an entire cache block? (5 points) Part (e) If the cache has a 95% hit rate and a one-cycle hit time, what is the average memory access time? (5 points) Part (f) If we run a program which consists of 3% load/store instructions, what is the average number of memory stall cycles per instruction? (5 points) 3

Question 6: Input/Output Little Howie is setting up a web site. He bought a fancy new hard disk which advertises: an 8ms average seek time, RPM, or roughly 6ms per rotation a 2ms overhead for each disk operation a transfer speed of,, bytes per second Howie had enough money left for a,, byte per second network connection. His system bus has a maximum bandwidth of 33 megabytes per second, and his HTML files have an average size of 8, bytes. Part (a) How much time will it take, on average, to read a random HTML file from the disk? Include the seek time, rotational delay, transfer time and controller overhead. (5 points) Part (b) With Howie s disk and network configuration, how many pages can be served per second? Round your answer to the nearest integer. (5 points) Part (c) Times are good, and Little Howie upgrades his web server with three additional hard disks, each with the specifications listed above. Now what is the maximum number of pages that can be served per second? Again, round to the nearest integer. (5 points) Part (e) Times are bad, and Howie has to downgrade his network to one with a,, byte per second bandwidth. How will this affect the number of pages per second that can be served? (5 points) 4