Computer Organization and Structure

Similar documents
COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CS 351 Exam 2 Mon. 11/2/2015

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

CS3350B Computer Architecture Quiz 3 March 15, 2018

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

Chapter 4. The Processor

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

Chapter 4. The Processor

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

ECE154A Introduction to Computer Architecture. Homework 4 solution

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

CSE 378 Midterm 2/12/10 Sample Solution

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CSEN 601: Computer System Architecture Summer 2014

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

LECTURE 3: THE PROCESSOR

COMP2611: Computer Organization. The Pipelined Processor

Pipelining. CSC Friday, November 6, 2015

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #3 Due Date: At the beginning of lecture, October 20 th, 2010

Chapter 4. The Processor

ECE232: Hardware Organization and Design

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 251, Winter 2019, Assignment % of course mark

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

CS 251, Winter 2018, Assignment % of course mark

Systems Architecture

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

ECE232: Hardware Organization and Design

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

DEE 1053 Computer Organization Lecture 6: Pipelining

CS 351 Exam 2, Fall 2012


Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Improving Performance: Pipelining

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECE260: Fundamentals of Computer Engineering

Pipeline design. Mehran Rezaei

ECS 154B Computer Architecture II Spring 2009

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Pipelined Processor Design

Full Datapath. Chapter 4 The Processor 2

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CS/CoE 1541 Exam 1 (Spring 2019).

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor

Processor (II) - pipelining. Hwansoo Han

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Chapter 4. The Processor

Processor Design Pipelined Processor (II) Hung-Wei Tseng

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Thomas Polzer Institut für Technische Informatik

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Full Datapath. Chapter 4 The Processor 2

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Inf2C - Computer Systems Lecture Processor Design Single Cycle

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Machine Organization & Assembly Language

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Introduction. Datapath Basics

Grading: 3 pts each part. If answer is correct but uses more instructions, 1 pt off. Wrong answer 3pts off.

Final Exam Spring 2017

Due Nov. 6 th, 23:59pm No Late Submissions Accepted

EIE/ENE 334 Microprocessors

COSC 6385 Computer Architecture - Pipelining

Chapter 5 Solutions: For More Practice

Final Exam Fall 2007

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

Chapter 4. The Processor. Jiang Jiang

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Laboratory Exercise 6 Pipelined Processors 0.0

ECE331: Hardware Organization and Design

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

ENCM 369 Winter 2013: Reference Material for Midterm #2 page 1 of 5

CSEE 3827: Fundamentals of Computer Systems

Transcription:

Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What is the accuracy of always-taken and always-out-taken predictors for this sequence of branch outcomes? b. What is the accuracy of the two-bit predictor for the first four branches in this pattern, assuming that the predictor starts off in the bottom left state from Figure 1 (predict not taken). Figure 1: The states in a 2-bit prediction scheme. c. What is the accuracy of the two-bit predictor if this pattern is repeated forever? d. Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever. Your predictor should be a sequential circuit with one output that provides a prediction (1 for taken, 0 for not taken) and no inputs other than the clock and the control signal that indicates that the instruction is a conditional branch. e. What is the accuracy of your predictor from the above subquestion if it is given a repeating pattern that is the exact opposite of this one? 2. Assuming that the breakdown of dynamic instructions into various instruction categories is as follows: R-Type beq jmp lw sw a. 50% 15% 10% 15% 10% b. 30% 10% 5% 35% 20% Also, assume the following branch predictor accurancies: Always-taken Always not-taken 2-bit

a. 40% 60% 80% b. 60% 40% 95% a. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used. b. Repeat the above problem for the always not-taken predictor. c. Repeat the above problem for the 2-bit predictor. d. With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced. e. With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced. f. Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions? 3. What is the average CPI for each of the following 2 schemes taking to execute the code sequence below? (Note: For the pipeline scheme, there are five stages: IF, ID, EX, MEM, and WB. We assume the reads and writes of register file can occur in the same clock cycle, and the stall circuits are available.) add $t3, $s1, $s2 sub $t1, $s1, $s2 lw $t2, 100($t3) sub $s1, $t1, $t2 a. single cycle scheme b. pipelined scheme with data forwarding hardware (one from EX/MEM to ALU input, and the other from MEM/WB to ALU input) available 4. Different instructions utilize different hardware blocks in the basic single-cycle implementation. The next three problems refer to the following instruction: Instruction Interpretation a. add Rd, Rs, Rt Reg[Rd]=Reg[Rs]+Reg[Rt] b. lw Rt, Offs(Rs) Reg[Rt]=Mem[Reg[Rs]+Offs]

Figure 2: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines. a. What are the values of control signals generated by the control in Figure 2 for this instruction? b. Which resources (blocks) perform a useful function for this instruction? c. Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction? Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 2 there are seven kinds of major blocks. Latencies of blocks along the critical (longest-latency) path for an instruction determine the minimum latency of that instruction. For the following three problems, assume the following resource latencies: I-Mem Add Mux ALU Regs D-Mem Control a. 400ps 100ps 30ps 120ps 200ps 350ps 100ps b. 500ps 150ps 100ps 180ps 220ps 1000ps 65ps d. What is the critical path for a MIPS AND instruction? e. What is the critical path for a MIPS load (LD) instruction? f. What is the critical path for a MIPS BEQ instruction?

The basic single-cycle MIPS implementation in Figure 2 can only implement some instructions. New instructions can be added to an existing ISA, but the decision whether or not to do that depends, among other things, on the cost and complexity such an addition introduces into the processor datapath and control. The next three problems refer to this new instruction: Instruction Interpretation a. add3 Rd, Rs, Rt,Rx Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx] b. sll Rt, Rd, Shift Reg[Rd]= Reg[Rt] << Shift (shift left by Shift bits) g. Which existing blocks (if any) can be used for this instruction? h. Which new functional blocks (if any) do we need for this instruction? i. What new signals do we need (if any) from the control unit to support this instruction? 5. The following piece of code has pipeline hazard(s) in it. Please try to reorder the instructions and insert the minimum number of NOP to make it hazard-free. (Note: Assume all the necessary forwarding logics exist) haz: move $5, $0 lw $10, 1000($20) addiu $20, $20, -4 addu $5, $5, $10 bne $20, $0, haz 6. We wish to add the instructions jr (jump register), sll (shift left logical), lui (load upper immediate), and a variant of the lw (load word) instruction to the single-cycle datapath. The variant of the lw instruction increments the index register after loading word from memory. This instruction (l_inc) corresponds to the following two instructions: lw $rs, L($rt) addi $rt, $rt, 1 Add any necessary datapaths and control signals to Figure 3 and show the necessary additions to Table 1. You can photocopy Figure 3 and Table 1 to make it faster to show the additions. Instruction RegDst ALUSrc Memto Reg Reg Write Mem Read Mem Write Branch ALUOp1 ALUOp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Table 1: The setting of the control lines is completely determined by the opcode fields of the instruction.

Figure 3: The simple datapath with the control unit.