CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Similar documents
CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

1 Hazards COMP2611 Fall 2015 Pipelined Processor

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Pipelining and Caching. CS230 Tutorial 09

Computer Organization and Structure

Final Exam Fall 2007

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS433 Homework 3 (Chapter 3)

ECE154A Introduction to Computer Architecture. Homework 4 solution

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

2. [3 marks] Show your work in the computation for the following questions involving CPI and performance.

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Pipelining. CSC Friday, November 6, 2015

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS/CoE 1541 Exam 1 (Spring 2019).

CS 251, Winter 2019, Assignment % of course mark

Static, multiple-issue (superscaler) pipelines

ELE 655 Microprocessor System Design

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

Computer Architecture CS372 Exam 3

Processor (II) - pipelining. Hwansoo Han

Assignment 1 solutions

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

ECEC 355: Pipelining

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Full Datapath. Chapter 4 The Processor 2

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

CS 251, Winter 2018, Assignment % of course mark

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS2100 COMPUTER ORGANISATION

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

Final Exam Fall 2008

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

ECE232: Hardware Organization and Design

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Structure of Computer Systems

Question 1: (20 points) For this question, refer to the following pipeline architecture.

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

ECE260: Fundamentals of Computer Engineering

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

ECE Exam II - Solutions November 8 th, 2017

Pipelined Processor Design

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipeline Control Hazards and Instruction Variations

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

Chapter 4 The Processor 1. Chapter 4B. The Processor

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Instr. execution impl. view

CS 2506 Computer Organization II Test 2

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

LECTURE 9. Pipeline Hazards

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

CS420/520 Homework Assignment: Pipelining

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS/CoE 1541 Mid Term Exam (Fall 2018).

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

LECTURE 3: THE PROCESSOR

( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

COMPUTER ORGANIZATION AND DESIGN

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

University of California at Santa Barbara. ECE 154A Introduction to Computer Architecture. Quiz #1. October 30 th, Name (Last, First)

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

CS232 Final Exam May 5, 2001

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Machine Organization & Assembly Language

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

CS252 Prerequisite Quiz. Solutions Fall 2007

ECE/CS 552: Pipeline Hazards

CS 351 Exam 2 Mon. 11/2/2015

Please state clearly any assumptions you make in solving the following problems.

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

ECE Exam II - Solutions October 30 th, :35 pm 5:55pm

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Q1: Finite State Machine (8 points)

Full Datapath. Chapter 4 The Processor 2

Transcription:

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions Tutorial Questions 2. [AY2014/5 Semester 2 Exam] Refer to the following MIPS program: # register $s0 contains a 32-bit value # register $s1 contains a non-zero 8-bit value # at the right most (least significant) byte add $t0, $s0, $zero #inst A add $s2, $zero, $zero #inst B lp: bne $s2, $zero, done #inst C beq $t0, $zero, done #inst D andi $t1, $t0, 0xFF #inst E bne $s1, $t1, nt #inst F addi $s2, $s2, 1 #inst G nt: srl $t0, $t0, 8 #inst H j lp #inst J done: We assume that the register $s0 contains 0xAFAFFAFA and $s1 contains 0xFF. Given a 5-stage MIPS pipeline processor, for each of the parts below, give the total number of cycles needed for the first iteration of the execution from instructions A to H (i.e. excluding the j lp ). Remember to include the cycles needed for instruction H to finish the WB stage. Note that the questions are independent from each other. a. With only data forwarding mechanisms and no control hazard mechanism. b. With data forwarding and assume not taken branch prediction. Note that there is no early branching. c. By swapping two instructions (from Instructions A to H), we can improve the performance of early branching (with all additional forwarding paths). Give the two instructions that can be swapped. You only need to indicate the instruction letters in your answer. AY2017/8 Semester 2-1 of 5 - CS2100 Tutorial #10 Selected Answers

Answers: a) 20 cycles 16 17 18 19 20 beq F D E M W andi F D E M W addi The addi instruction is not executed. srl F D E M W b) 14 cycles 1 2 3 4 5 6 7 8 9 10 11 12 13 14 beq F D E M W andi F D E M W addi F D E * * srl F D E M W c) Swap instructions A and B to reduce the delay between instructions B and C. add $s2, $zero, $zero #inst B add $t0, $s0, $zero #inst A lp: bne $s2, $zero, done #inst C beq $t0, $zero, done #inst D andi $t1, $t0, 0xFF #inst E bne $s1, $t1, nt #inst F addi $s2, $s2, 1 #inst G nt: srl $t0, $t0, 8 #inst H j lp #inst J done: AY2017/8 Semester 2-2 of 5 - CS2100 Tutorial #10 Selected Answers

3. Consider the following code fragment: loop: lw $t1, 0($t2) #i1 addi $t1, $t1, 1 #i2 sw $t1, 0($t2) #i3 addi $t2, $t2, 4 #i4 sub $t4, $t3, $t2 #i5 bne $t4, $zero, loop #i6 For simplicity, the setup code is not given. You can assume that $t2 refers to a valid array element and $t3 is initialized to $t2 + 400 at the start of the code. Let us study a pipeline processor with the different mechanisms discussed in lecture. For each of the following parts, give: Timing chart for {i1 to i6} + {i1 from next iteration } Total cycles needed for the code Data Hazards a. Suppose the processor has no mechanisms to handle RAW data hazards as well as control hazards. The processor will stall on these hazards until the hazards "disappear". Note that the register files still support "write-then-read" in a single cycle. b. Suppose the processor has full forwarding paths for RAW data hazards. However, control dependency is still handled by stalling the processor. Control Hazards The following parts assume full forwarding paths for RAW data hazards. c. Branch is handled by early resolution in ID stage. d. Branch is handled by branch prediction (predict taken). Branch is still resolved in MEM stage. When we use predict-taken scheme, every branch instruction is assumed to be taken. The target instruction will be fetched in the next cycle with no stall. e. Branch is handled by delayed branching. Branch is still resolved in MEM stage. Show the modified code sequence in additional to the two tasks mentioned. AY2017/8 Semester 2-3 of 5 - CS2100 Tutorial #10 Selected Answers

(a) Answers: 16 17 18 i2 F D D D E M W i3 F F F D D D E M W i4 F F F D E M W i5 F D D D E M W i6 F F F D D D E M W i1 F One iteration takes 18 cycles. However, as the next iteration starts at cycle 18, i.e. an overlap of 1 cycle, we can calculate the total cycles as: (18 1) 100 iterations + 1 cycle for the WB of the last bne inst. = 1701 cycles (b) The i2 i1 RAW hazards causes 1 cycle delay due to the leading instruction (i1 is a lw). One iteration takes 11 cycles. However, as the next iteration starts at cycle 11, i.e. an overlap of 1 cycle, we can calculate the total cycles as: (11 1) 100 iterations + 1 cycle for the WB of the last bne inst. = 1001 cycles (c) i6 F D* D E M W Since i5 produces the value used for the branch instruction, 1 cycle stall is needed so that the Ex/Mem ID stage forwarding (only for branch) can be used. The additional stall cycle is denoted as D* in the timing chart above. (12 3) 100 iterations + 3 cycles for the EX-MEM- WB of the last bne inst. = 903 cycles AY2017/8 Semester 2-4 of 5 - CS2100 Tutorial #10 Selected Answers

(d) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 (11 4) 100 iterations + 4 cycles for the WB of the last bne inst. = 704 cycles (e) Firstly, there are 3 delayed slots in this case, as branch is resolved in Mem stage. Distance to Fetch stage is 3 cycles 3 delayed slot. If we can only reorder the original code without modification, then there are no suitable instructions for the delayed slots. This results in 3 nop instructions after the bne instruction: This gives us the exact same performance as if there is no mechanism for control dependency. AY2017/8 Semester 2-5 of 5 - CS2100 Tutorial #10 Selected Answers