Assignment 1 solutions

Similar documents
Designing a Pipelined CPU

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CS3350B Computer Architecture Quiz 3 March 15, 2018

Static, multiple-issue (superscaler) pipelines

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

The Processor: Improving the performance - Control Hazards

Chapter 4 The Processor 1. Chapter 4B. The Processor

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

1 Hazards COMP2611 Fall 2015 Pipelined Processor

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Binvert Operation (add, and, or) M U X

Complex Pipelines and Branch Prediction

COMPUTER ORGANIZATION AND DESIGN

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE 30 Introduction to Computer Engineering

CS/CoE 1541 Mid Term Exam (Fall 2018).

Lecture 8: Data Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Computer Organization and Structure

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

Advanced Computer Architecture Pipelining

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Please state clearly any assumptions you make in solving the following problems.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

2B 52 AB CA 3E A1 +29 A B C. CS120 Fall 2018 Final Prep and super secret quiz 9

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Question 1: (20 points) For this question, refer to the following pipeline architecture.

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

Chapter 4. The Processor

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Computer Architecture and Engineering. CS152 Quiz #4 Solutions

CSE 490/590 Computer Architecture Homework 2

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipeline Architecture RISC

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

COSC 6385 Computer Architecture - Pipelining

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Quiz 5 Mini project #1 solution Mini project #2 assigned Stalling recap Branches!

CS 251, Winter 2019, Assignment % of course mark

CS433 Homework 3 (Chapter 3)

Final Exam Fall 2007

Pipelined Processor Design

Chapter 4. The Processor

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Chapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved.

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

ECE154A Introduction to Computer Architecture. Homework 4 solution

Cache Organizations for Multi-cores

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

Pipelining. Chapter 4

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

ECE Exam II - Solutions October 30 th, :35 pm 5:55pm

ECE/CS 552: Pipeline Hazards

ECEC 355: Pipelining

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Designing a Pipelined CPU

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS / ECE 6810 Midterm Exam - Oct 21st 2008

Multiple Instruction Issue. Superscalars

ECE Exam II - Solutions November 8 th, 2017

COSC 6385 Computer Architecture. Instruction Level Parallelism

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

October 24. Five Execution Steps

ECE232: Hardware Organization and Design

Pipelined Processor Design

Processor design - MIPS

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Computer Architecture. Lecture 6.1: Fundamentals of

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Chapter 4. The Processor

ECE331: Hardware Organization and Design

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

CS 351 Exam 2, Fall 2012

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Pipeline design. Mehran Rezaei

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CS 251, Winter 2018, Assignment % of course mark

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

COMPUTER ORGANIZATION AND DESIGN

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

November 7, 2014 Prediction

CS232 Final Exam May 5, 2001

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

The Pipelined MIPS Processor

CS252 Prerequisite Quiz. Solutions Fall 2007

Looking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

HY425 Lecture 05: Branch Prediction

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)

Transcription:

Assignment solutions. The jal instruction does a jump identical to the j instruction (i.e., replacing the low order 28 bits of the with the ress in the instruction) and also writes the value of the + 4 in register 3 ($ra). The hardware required to do this is similar to that for the single cycle implementation; the value must be passed along the pipeline until it can be written into the register file, and the value 3 must be made available as an ress input to the register file. (See the red components in the attached diagram). The main question is when can the return ress( + 4) be written into the register file? Consider that there may already be up to 4 instructions waiting to write values in the register file, (say, R-type instructions) then for those to complete successfully, the value could not be written into the register file until all those have completed. Therefore, this value would be written in pipeline stage 5. 2. From the earlier question, the jump part of jal is completed in the first stage, and the link part (writing the return ress in register $ra) in the fifth stage. Therefore, any operation requiring the return ress must wait until the correct value is written there. This means that, for example, a return from subprogram (jr cannot happen immediately. The IPS function call required saving some information on the stack and restoring it, so this would not normally happen, anyway. Register 3 ($ra) was normally pushed on the stack however, and this also should not happen until the correct value was saved in $ra. Therefore, other instructions should be scheduled before this is done. 3. The jr instruction does not alter any of the values in the register files, only the, so it can complete after the register value is available, which happens in the second pipeline stage. The itional hardware and control logic required is again similar to that for the single cycle implementation; namely, a path fromthe register file to the, and a control signal for the required. Since jr is an r-type instruction, at least part of the function field must also be decoded in this stage. (See the blue components in the attached diagram). 4. If the jr completes in the second cycle, there would also be a delay slot following this instruction, as was the case for the branch instructions. This could again be filled with a useful instruction that was aays executed, or a instruction otherwise. Forwarding here would be similar to forwarding for the branch instructions if an R- type instruction, say, immediately preceded the jr instruction, then forwarding would not work and a stall would be required. If it were two instructions before the jr

instruction, forwarding could help. The logic would be similar to that for the AL stage. It would, of course, have to be in the ID stage. Comment: In the original IPS architecture, all the jump instructions has a delay slot following the instruction. This meant that for the jal instruction, it was actually + 8 that should be saved as the return ress. There was also a jump and link register jalr instruction similar to the jal instruction that jumped to the ress stored in a register, and also saved the return ress in register 3. 5. Each instruction following the initial depends on the value in register 3, written by that instruction. Therefore, there are three hazards from this instruction, all of which can be resolved by forwarding. There is also a hazard between the instruction, which writes register 6, and the instruction following, which reads register 6, This hazard cannot be resolved by forwarding. 6. loop: i $3, $3, 4 beq $3, $4, loop $2, 96($3) \\ subtract the 4 ed by i 7. The original code was,,,,,,... where the instruction only on the preceding instruction. In the non-pipelined machine, and the generated code would be... This code would be executed times. Neglecting the final 4 cycles (to complete the last instruction in the loop), the total time would be 5 cycles, or 5 cycles for 2 instructions. The CPI would therefore be 5/2 = 2.5. For the pipelined machine with forwarding but no hazard detection, instructions would not be required, and the generated code would be 2

... The CPI would therefore be. Note that if the instruction depended on the preceding instruction, one instruction would still be required, and the effective CPI would be.5. 8. The following shows the predictions for the four predictors and the given branch patterns, for the 25 branch instances: Behavior Predictions aays taken aays not -bit 2-bit weak taken T-T-T T-T-T F-F-F T-T-T T-T-T 2 N-N-N-N F-F-F-F T-T-T-T F-T-T-T F-T-T-T 3 T-N-T-N-T-N T-F-T-F-T-F F-T-F-T-F-T T-F-F-F-F-F T-F-T-F-T-F 4 T-T-T-N-T T-T-T-F-T F-F-F-T-F T-T-T-F-F T-T-T-F-T 5 T-T-N-T-T-N-T T-T-F-T-T-F-T F-F-T-F-F-T-F T-T-F-F-T-F-F T-T-F-T-T-F-T 5 T, F T, 5F 3T, 2F 8T, 7F Accuracy 5/25 =.6 /25 =.4 3/25 =.52 8/25 =.72 9. The original loop, which is to be unrolled, was: loop: $2, ($) sub $4, $2, $3 sw $4, ($) i $, $, 4 bne $, $3, loop Almost every instruction depends on the instruction preceding it. (The purpose of this loop is to subtract the value in register $3 from the array in memory pointed to by register $. Note that a better programmer would have reused register $2 rather than introducing register $4.) Thetargetmachine isthestandardips, whichhadnoforwardingandasinglebranch delay slot. Although not part of the answer, consider the following rescheduling of the original loop, for both the original IPS and a IPS with forwarding: 3

standard IPS IPS with forwarding loop: $2, ($) loop: $2, ($) i $, $, 4 i $, $, 4 sub $4, $2, $3 bne $, $3, loop sub $4, $2, $3 sw $4, -4($) bne $, $3, loop sw $4, -4($) /* register was incremented */ Note that the IPS with forwarding does not need loop unrolling for this example. The previous schedule can be used as a pattern for the unrolled loop (it may not be optimal): loop: $2, ($) $5, 4($) i $, $, 8 sub $4, $2, $3 sub $6, $5, $3 sw $4, -8($) bne $, $3, loop sw $6, -4($) The original rescheduled loop for the standard IPS required 9 instruction fetches for a single loop iteration. or 8 for two loop iterations. The unrolled loop required instruction fetches for a single iteration. The unrolled loop should then complete in /8 = or.6 of the time required for the original loop. The loop could be implemented without instructions if it was unrolled 4 times. 4

control AL Inst[5 ] Register Register 2 Register 2 emory Data extend Sign Shift left 2 Add Instruction [3 ] emory Instruction ress E E IF WB ID Add 4 AL Zero Registers Address Inst[5 ] Inst[25 2] 32 6 Inst[5 ] E/E IF/ID ID/E E/WB JAL Inst[2 6] 3 JAL JR