CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

Similar documents
The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Pipelining. CSC Friday, November 6, 2015

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Final Exam Fall 2008

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Final Exam Fall 2007

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Computer Architecture CS372 Exam 3

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.

CS 251, Winter 2018, Assignment % of course mark

CS/COE1541: Introduction to Computer Architecture

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Computer Architecture. Lecture 6.1: Fundamentals of

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Computer Organization and Structure

5008: Computer Architecture HW#2

COMPUTER ORGANIZATION AND DESI

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

CSCE 212: FINAL EXAM Spring 2009

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Final Exam Spring 2017

CS 251, Winter 2019, Assignment % of course mark

Homework 4 - Solutions (Floating point representation, Performance, Recursion and Stacks) Maximum points: 80 points

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

CSE 378 Midterm Sample Solution 2/11/11

ECE 30 Introduction to Computer Engineering

Complex Pipelines and Branch Prediction

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

ECEC 355: Pipelining

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Chapter 4. The Processor

2B 52 AB CA 3E A1 +29 A B C. CS120 Fall 2018 Final Prep and super secret quiz 9

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

CSEE 3827: Fundamentals of Computer Systems, Spring Caches

LECTURE 3: THE PROCESSOR

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

6.004 Tutorial Problems L22 Branch Prediction

Alexandria University

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

CS232 Final Exam May 5, 2001

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Full Datapath. Chapter 4 The Processor 2

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

CS/CoE 1541 Exam 1 (Spring 2019).

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

Thomas Polzer Institut für Technische Informatik

CSEE 3827: Fundamentals of Computer Systems

comp 180 Lecture 10 Outline of Lecture Procedure calls Saving and restoring registers Summary of MIPS instructions

Static, multiple-issue (superscaler) pipelines

Performance Evaluation CS 0447

What is Pipelining? RISC remainder (our assumptions)

Please state clearly any assumptions you make in solving the following problems.

Solutions to exercises on Instruction Level Parallelism

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 154B Spring Project 4. Dual-Issue Superscalar MIPS Processor. Project Checkoff: Friday, June 1 nd, Report Due: Monday, June 4 th, 2018

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Written Exam / Tentamen

CS150 Fall 2012 Solutions to Homework 6

Chapter 4. The Processor

Processor (II) - pipelining. Hwansoo Han

CSC258: Computer Organization. Microarchitecture

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4 The Processor 1. Chapter 4B. The Processor

Pipeline design. Mehran Rezaei

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Good luck and have fun!

Full Datapath. Chapter 4 The Processor 2

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

The Processor: Instruction-Level Parallelism

Cache Organizations for Multi-cores

Lecture 7 Pipelining. Peng Liu.

LECTURE 10. Pipelining: Advanced ILP

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Processor Design Pipelined Processor (II) Hung-Wei Tseng

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

CS433 Homework 3 (Chapter 3)

CS252 Graduate Computer Architecture Midterm 1 Solutions

Advanced Instruction-Level Parallelism

CSE 490/590 Computer Architecture Homework 2

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Transcription:

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions Profs. Stephen A. Edwards & Martha Kim Columbia University Due December 10th, 2012 at 5:00 PM Turn in at CSB 469. Write your name and UNI on your solutions Show your work for each problem; we are more interested in how you get the answer than whether you get the right answer.

1. (25 points) Pipelined software execution. (a) Assuming a fully bypassed (i.e., both W-E and M-E operand forwarding), 5-stage MIPS pipeline with early branch resolution, indicate all of the stalls and operand forwarding that is used to execute the following ten instructions: bnez $a0, tree_sum_recurse addi $sp, $sp, -12 stall for branch sw $ra, 0($sp) forward $sp M to E sw $s0, 4($sp) forward $sp W to E sw $s1, 8($sp) move $s0, $a0 lw $s1, 0($s0) forward $s0 M to E lw $a0, 4($s0) forward $s0 W to E (b) How many cycles will it take for this code to execute completely on the pipelined processor? 8 instructions + 1 stall + 4 cycles to drain pipe = 13 cycles (c) Will this code run faster on a 100MHz single cycle processor or a 400MHz pipelined processor? 8 cycles * 10ns/cycle = 80ns v. 13 cycles * 2.5ns/cycle = 32.5ns; it will be faster on the pipeline design.

2. (25 points) Assuming a pipeline with no bypassing, reorder the instructions such that they compute the same function, but require no pipeline stalls due to data hazards. Branch stalls are fine. You should reorder instructions as effectively as you can before inserting as few s as possible. Of the s you inserted, indicate which could be avoided with bypassing. FYI: This function takes a pointer to a string and a character, and counts how many times the character appears in the string. count: addi $sp, $sp, -4 li $v0, 0 # hoisted # cannot hoist anything out of loop # body -- bypassing would eliminate sw $ra, 0($sp) count_top: lb $t0, 0($a1) # no choice but to wait, addi must # come after both branches fall thru # bypassing would eliminate one beq $t0, $zero, count_done bne $t0, $a0, count_advance addi $v0, $v0, 1 count_advance: addi $a1, $a1, 1 # bypassing would eliminate (addi to lb at count_top) j count_top count_done: lw $ra, 0($sp) addi $sp, $sp, 4 # no useful work left to put here -- # bypassing would eliminate jr $ra

3. (25 pts.) Suppose the MIPS processor were divided into 10 stages, s1 through s10. The key operations are distributed across the stages as follows: s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 branch register ALU data register resolution decode memory writeback Hint: You can ignore conditional branches for this problem. (a) What is the ideal CPI of this pipeline? 1 cycle per instruction (b) Assuming no operand fordwarding, what is the CPI of producer and consumer instructions? Without forwarding, operands must be passed through the register file, written in S10 and read in s5. Such dependent instructions can be processed at a rate of one instruction every six cycles, for a CPI of 6. (c) Consider an instruction mix of 25% loads, 10% stores, 2% jumps, and 52% R-type instructions. Assuming that 50% of the loads and R-type instructions are followed immediately by their consumers, and that there is no data forwarding, what is the average CPI of this workload? The CPI will be 1 for all of the instructions except for the half of loads (12.5%) and R-types (26%) that have a CPI of 6. So, the overall CPI is 6 0.385 + 1 (1 0.385) = 2.925. (d) What is the average CPI on the standard 5 stage pipeline, also without operand forwarding? The CPI remains 1 for everything except the same bubble-requiring instructions. These require two bubbles, resulting in a CPI of 3 for those instructions. So the overall CPI is 3 0.385 + 1 (1 0.385) = 1.77

(e) How much faster does the 10-stage pipe s clock need to be than the 5-stage in order for the two processors to have the same performance on this workload? CP 10 1 = CP 5 1 ƒ 10 ƒ 5 2.925 1 = 1.77 1 ƒ 10 ƒ 5 2.925 ƒ 5 = 1.77 ƒ 10 2.925 1.77 = ƒ 10 ƒ 5 = 1.65 times faster

4. (25 pts.) Consider the direct mapped cache that interprets an 8-bit address according to the {tag:setidx:byteoffset} format specified. (a) Complete the table on the following page, indicating whether the each access in the reference stream will hit (H) or miss (M) in each cache. (b) If the L1 access time is 7ps, and memory has a 100ps access time, what is the expected access time for the references above on the two level hierarchy? EAT = AT L1 + MR L1 AT M = 7 + 0.66 100 = 73ps (c) Will the addition of an L2 cache with a 50% miss rate and 20ps access time improve or degrade the hierarchy? The time to resolve an L1 miss is now 20 + 0.5 100 (to access the L2 and memory) which is less than the 100 it was before, so this change will improve the hierarchy.

L1 Address Format {2:2:4} Block Size (Bytes): 16 Cache Size (Blocks): 4 Address References (in binary) 01. 00. 0000 M 00. 10. 0000 M 00. 01. 0000 M 00. 00. 1000 M 00. 00. 0100 H 00. 00. 0010 H