Computer Architecture EE 4720 Midterm Examination

Similar documents
Computer Architecture EE 4720 Final Examination

Very Simple MIPS Implementation

Computer Architecture EE 4720 Midterm Examination

Computer Architecture EE 4720 Midterm Examination

Very Simple MIPS Implementation

Notes Interrupts and Exceptions. The book uses exception as a general term for all interrupts...

Computer Architecture EE 4720 Midterm Examination

Computer Architecture EE 4720 Final Examination

Computer Architecture EE 4720 Midterm Examination

Very short answer questions. "True" and "False" are considered short answers.

Computer Architecture EE 4720 Midterm Examination

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

Computer Architecture EE 4720 Final Examination

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Computer Architecture EE 4720 Midterm Examination

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

CS 152, Spring 2011 Section 2

Computer Architecture EE 4720 Final Examination

Spring 2014 Midterm Exam Review

Appendix C: Pipelining: Basic and Intermediate Concepts

Computer Architecture EE 4720 Practice Final Examination

EITF20: Computer Architecture Part2.2.1: Pipeline-1

LSU EE 4720 Homework 5 Solution Due: 20 March 2017

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Lecture 10: Simple Data Path

EITF20: Computer Architecture Part2.2.1: Pipeline-1

IF1 --> IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB. add $10, $2, $3 IF1 IF2 ID1 ID2 EX1 EX2 ME1 ME2 WB sub $4, $10, $6 IF1 IF2 ID1 ID2 --> EX1 EX2 ME1 ME2 WB

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Two Dynamic Scheduling Methods

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Static, multiple-issue (superscaler) pipelines

Computer System Architecture Midterm Examination Spring 2002

Chapter 4. The Processor

Computer Architecture EE 4720 Final Examination

5008: Computer Architecture HW#2

(1) Using a different mapping scheme will reduce which type of cache miss? (1) Which type of cache miss can be reduced by using longer lines?

CS3350B Computer Architecture Winter 2015

3. (2 pts) Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

Computer Architecture EE 4720 Final Examination

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

Pipelining. Principles of pipelining. Simple pipelining. Structural Hazards. Data Hazards. Control Hazards. Interrupts. Multicycle operations

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CMSC411 Fall 2013 Midterm 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Final Exam Fall 2007

There are different characteristics for exceptions. They are as follows:

09-1 Multicycle Pipeline Operations

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Instruction Pipelining

Basic Pipelining Concepts

Instruction Pipelining

Chapter 4 The Processor 1. Chapter 4A. The Processor

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Four Steps of Speculative Tomasulo cycle 0

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Major CPU Design Steps

Chapter 4. The Processor

LECTURE 3: THE PROCESSOR

COSC 6385 Computer Architecture - Pipelining

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Lecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions

ECEC 355: Pipelining

ECE 313 Computer Organization FINAL EXAM December 13, 2000

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Chapter 4. The Processor

CS3350B Computer Architecture Quiz 3 March 15, 2018

EXERCISE 3: DLX II - Control

ECE 486/586. Computer Architecture. Lecture # 7

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 61C: Great Ideas in Computer Architecture. Lecture 11: Datapath. Bernhard Boser & Randy Katz

Super Scalar. Kalyan Basu March 21,

Pipelining. Maurizio Palesi


Hardware-based Speculation

Computer Architecture

Pipelining. CSC Friday, November 6, 2015

Very short answer questions. You must use 10 or fewer words. "True" and "False" are considered very short answers.

Final Exam Spring 2017

Very short answer questions. You must use 10 or fewer words. "True" and "False" are considered very short answers.

COMPUTER ORGANIZATION AND DESIGN

Instruction Pipelining Review

Processor (I) - datapath & control. Hwansoo Han

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Structure of Computer Systems

1. (10) True or False: (1) It is possible to have a WAW hazard in a 5-stage MIPS pipeline.

MIPS An ISA for Pipelining

CS61C : Machine Structures

CS/CoE 1541 Exam 1 (Spring 2019).

Instruction Set Architecture (ISA)

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

LSU EE 4720 Dynamic Scheduling Study Guide Fall David M. Koppelman. 1.1 Introduction. 1.2 Summary of Dynamic Scheduling Method 3

Transcription:

Name Solution Computer Architecture EE 4720 Midterm Examination 22 March 2000, 13:40 14:30 CST Alias Problem 1 Problem 2 Problem 3 Exam Total (35 pts) (20 pts) (45 pts) (100 pts) Good Luck!

Problem 1: The DLX implementation below has six stages. (The work done by ID is now done by ID and RR.) IF +4 PC Mem Port ID RR EX 4 MEM WB sign ext. Control IMM 6..10 11..15 16..20 or 11..15 D In A B IMM 6 5,6 =0 ALU B In Mem Out ALU MD (a) The execution of some code on this pipeline is shown below. Add exactly the bypass paths needed so that the code executes as illustrated. Next to each bypass path indicate the cycle(s) in which it will be used. (Do not add bypass paths that won t be used in the execution of the code below.) (10 pts)! Cycle 0 1 2 3 4 5 6 7 8 add r1, r2, r3 IF ID RR EX ME WB lw r5, 10(r1) IF ID RR EX ME WB xor r4, r1, r2 IF ID RR EX ME WB and r6, r5, r4 IF ID RR EX ME WB The changes are shown in the diagram above in blue bold. (b) Show the execution of the code below on this pipeline until bneq reaches IF a second time. The branch is taken. Be sure to base the CTI behavior on the hardware shown above. Show where instructions are squashed. (10 pts) The RR stage adds a cycle of branch penalty, but one cycle is saved, compared to the original pipelined DLX implementation, because the ALU output, rather than the EX/MEM.ALU latch, is fed back to the PC multiplexor. LOOP:! Solution.! Cycle 0 1 2 3 4 5 6 7 8 bneq r1, SKIP IF ID RR EX ME WB IF add r2, r3, r4 IF ID RRx sub r5, r6, r7 IF IDx and r8, r9, r10 IFx or r11, r12, r13 SKIP: j LOOP IF ID RR EX ME add r2, r3, r4 IF ID RRx sub r5, r6, r7 IF IDx and r8, r9, r10 IFx or r11, r12, r13 2

Problem 1, continued: The figure and code below are identical to the ones on the previous page. (c) Add branch and jump hardware so that the code executes as quickly as possible. Additional adders can be used, the fewer the better. Branches and jumps can be handled separately. The register file cannot be moved or duplicated and cannot be read before the RR stage. (Jumps and branches both use displacement addressing. In branches the displacement is in bits 16 to 31 and in jumps the displacement is in bits 6 to 31.) Do not show control hardware. (10 pts) Solution to be added by May 2000. IF ID RR EX MEM WB +4 6..10 11..15 A B =0 ALU Mem ALU PC sign ext. IMM D In IMM B In Out MD Mem Port Control 16..20 or 11..15 (d) Show how the code below executes on the modified pipeline. As before, show execution until bneq enters IF a second time. (5 pts) LOOP:! Solution.! Cycle 0 1 2 3 4 5 6 7 8 bneq r1, SKIP IF ID RR EX ME WB IF ID RR EX add r2, r3, r4 IF IDx sub r5, r6, r7 IFx and r8, r9, r10 or r11, r12, r13 SKIP: j LOOP IF ID RR EX ME WB add r2, r3, r4 IFx sub r5, r6, r7 and r8, r9, r10 or r11, r12, r13 3

Problem 2: The code below executes on a dynamically scheduled processor that uses reorder buffer entry numbers to name registers. There are 128 reorder buffer entries, the next free entry is #1. (a) Show when each instruction commits and show the contents of the register map, register file and reorder buffers using the space provided. Show only the instruction line numbers (A, B, C, D) in the reorder buffer. Indicate cycle numbers above the reorder buffer. (10 pts) Pipeline Execution Diagram Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 A:addd f0, f2, f0 IF ID A0 A1 A2 A3 WC B:addd f6, f0, f6 IF ID RS RS RS A0 A1 A2 A3 WC C:addd f0, f0, f6 IF ID RS RS RS RS RS RS A0 A1 A2 A3 WC D:addd f6, f2, f8 IF ID A0 A1 A2 A3 WB C Register Map Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Arch. Reg. Val. or ROB# f0 10 #1 #3 120 f2 20 f6 60 #2 #4 100 f8 80 Register File Arch. Reg. Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Val. f0 10 30 120 f2 20 f6 60 90 100 f8 80 Cycle: Reorder buffer: 4

Problem 2, continued: (b) The code below is identical to the code above, however the second add instruction raises an exception in cycle 9 (indicated by a star). Complete the pipeline execution diagram and other tables for this situation for the instructions shown. (Do not show the trap handler.) Show the contents of the reorder buffers after each change. (10 pts) Pipeline Execution Diagram Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 A:addd f0, f2, f0 IF ID A0 A1 A2 A3 WC B:addd f6, f0, f6 IF ID RS RS RS A0 A1 A2 *A3*WF C:addd f0, f0, f6 IF ID RS RS RS RS RS RS A0x D:addd f6, f2, f8 IF ID A0 A1 A2 A3 WB x Register Map Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Arch. Reg. Val. or ROB# f0 10 #1 #3 30 f2 20 f6 60 #2 #4 100 60 f8 80 Register File Arch. Reg. Cycle: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Val. f0 10 30 f2 20 f6 60 f8 80 Cycle: Reorder buffer: 5

Problem 2: Answer each question below. (a) Why do DLX Type-R instructions have a func field? (7 pts) Type R: Opcode 0 0 5 rs1 6 10 rs2 11 15 rd 16 20 func 21 31 Because there is not enough room in the opcode field to code all DLX instructions. If the func field were eliminated and the size of the opcode field were increased Type I and Type J instructions would have to have smaller immediates. (b) Would there be any advantage in pipelining the integer ALU used in the statically scheduled DLX implementation? Explain. (7 pts) No, since it produces an answer in one cycle. Well, maybe. If functional units in the other stages could be pipelined then the clock frequency could be doubled. (c) Why do ISAs include one-byte integers? (What are the advantages of using them when running on typical implementations.) (7 pts) They take less space and so are useful for storing large amounts of small numbers. The are not faster than full-size integers, arithmetic operations on both can be completed in one cycle. 6

(d) The DLX implementations covered in class start reading registers before the instruction is decoded. Why is this possible? Modify the instruction codings so that it is no longer possible. That is, with the modified codings decoding would have to be performed before register read (as in problem 1). (8 pts) The DLX codings are given below for reference and can be used to explain your answer. Opcode rs1 rs2 rd func 0 Type R: 0 5 6 10 11 15 16 20 21 31 Opcode rs1 rd Immediate Type I: 0 5 6 10 11 15 16 31 Opcode Offset Type J: 0 5 6 31 Swap the rd and rs1 fields in the Type R instructions. 7

(e) Ignoring floating-point instructions, how can precise exceptions be implemented on the statically scheduled (Chapter 3) DLX implementation? Illustrate your answer with an example in which exceptions occur out of order but are handled in order. Show the code and a pipeline execution diagram, indicate where the exceptions are occurring. (9 pts) Pass an exception bit and other information down the pipeline, test for an exception at the end of the MEM stage. In the example below the illegal instruction exception occurs before the page fault, but the page fault is seen at MEM before the illegal instruction exception. lw r4, 0(r5) IF ID EX *ME* WB ant r1, r2, r3 IF *ID* EX ME WB (f) What is an advantage of a stack ISA over a RISC ISA? What is a disadvantage of a stack ISA over a RISC ISA? (7 pts) Advantage of stack: smaller code. Disadvantage: slower implementations. 8