Computer Architecture Sample Test 1

Similar documents
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Please state clearly any assumptions you make in solving the following problems.

CS152 Computer Architecture and Engineering. Complex Pipelines

Chapter 4. The Processor

5008: Computer Architecture HW#2

LECTURE 3: THE PROCESSOR

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Week 11: Assignment Solutions

What is Pipelining? RISC remainder (our assumptions)

Chapter 4. The Processor

The Processor: Improving the performance - Control Hazards

ECE331: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN

Computer Architecture CS372 Exam 3

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Final Exam Fall 2007

Good luck and have fun!

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

Pipelining. CSC Friday, November 6, 2015

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

MIPS Assembly Language Guide

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Hardware-based Speculation

HW1 Solutions. Type Old Mix New Mix Cost CPI

William Stallings Computer Organization and Architecture

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CS/CoE 1541 Mid Term Exam (Fall 2018).

CS252 Prerequisite Quiz. Solutions Fall 2007

Full Datapath. Chapter 4 The Processor 2

Where Does The Cpu Store The Address Of The

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

Pipelining and Vector Processing

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

Multi-cycle Instructions in the Pipeline (Floating Point)

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?

CS433 Homework 2 (Chapter 3)

Multiple Instruction Issue. Superscalars

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Assembly Language Programming

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Improving Performance: Pipelining

CS232 Midterm Exam 3 April 23, 2004

s complement 1-bit Booth s 2-bit Booth s

CENG 5133 Computer Architecture Design Spring Sample Exam 2

ECE 3055: Final Exam

Instruction Pipelining Review

Micro-programmed Control Ch 15

The Nios II Family of Configurable Soft-core Processors

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Micro-programmed Control Ch 15

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Pipeline Architecture RISC

ECEN/CSCI 4593 Computer Organization and Design Exam-2

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

COMPUTER ORGANIZATION AND DESIGN

ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism

UNIT- 5. Chapter 12 Processor Structure and Function

Chapter 8. Pipelining

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Computer Architecture Spring 2016

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS232 Final Exam May 5, 2001

Department of Electrical and Computer Engineering The University of Texas at Austin

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

CPE300: Digital System Architecture and Design

Do not start the test until instructed to do so!

University of Toronto Faculty of Applied Science and Engineering

Micro-programmed Control Ch 17

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

ECE 486/586. Computer Architecture. Lecture # 12

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

CPU Structure and Function

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CS433 Homework 2 (Chapter 3)

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Pipelined processors and Hazards

Final Examination Semester 3 / Year 2008

Appendix C: Pipelining: Basic and Intermediate Concepts

Delhi Noida Bhopal Hyderabad Jaipur Lucknow Indore Pune Bhubaneswar Kolkata Patna Web: Ph:

CSE 490/590 Computer Architecture Homework 2

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

Transcription:

Computer Architecture Sample Test 1 Question 1. Suppose we have 32-bit memory addresses, a byte-addressable memory, and a 512 KB (2 19 bytes) cache with 32 (2 5 ) bytes per block. a) How many total lines are in the cache? b) If the cache is direct-mapped, how many cache lines could a specific memory block be mapped to? c) If the cache is direct-mapped, what would be the format (tag bits, cache line bits, block offset bits) of the address? (Clearly indicate the number of bits in each) d) If the cache is 2-way set associative, how many cache lines could a specific memory block be mapped to? e) If the cache is 2-way set associative, how many sets would there be? f) If the cache is 2-way set associative, what would be the format of the address? g) If the cache is fully-associative, how many cache lines could a specific memory block be mapped to? h) If the cache is fully-associative, what would be the format of the address? Question 2. What is the difference between the cache write policies: writeback and writethrough with respect to each of the following: a) the relative frequency of memory accesses b) the ability of a multiple CPU system to keep caches coherency Question 3. Assume that a program wants to copy a 500 word block of data from an I/O device to memory. Complete the following table for each I/O technique. Total number of interrupts for whole block transfer Total number of READ I/O commands issued by the CPU to the I/O module Programmed I/O Interrupt-driven I/O Direct-memory Access

Question 4. Assume special I/O instructions (i.e., isolated I/O) are used to fill I/O-controller registers. Why can t a user program use these instructions to communicate with the I/O device directly and by-pass the operating system s protection checking? Question 5. When a DMA controllers needs to use the bus to access memory, it performs cycle stealing by making the CPU wait if it is also trying to access memory. Even though the CPU is continually executing, why might the DMA device often find the bus to memory free when it needs to access memory? Question 6. Consider the demand paging system with 1024-byte pages. Running Process A CPU page# offset Page Table for A Valid Frame# Bit 0 1 2 3 4 5 6 Physical Frame Number 0 page 2 of B 1 page 3 of A 2 page 4 of A 3 page 5 of B 4 page 0 of A 5 page 2 of A 6 page 4 of B frame# offset Process B page 0 page 1 page 2 page 3 page 4 page 5 page 6 Process A page 0 page 1 page 2 page 3 page 4 page 5 page 6 Logical Addr. Physical Addr. a) Complete the above page table for Process A. b) If process A is currently running and the CPU generates a logical/virtual address of 3100 10, then what would be the corresponding physical address?

Question 7. You are to assume the same 5-stage pipeline discussed in class when answering these questions. Assume that the first register in an arithmetic operation is the destination register, e.g., in ADD R3, R2, R1 register R3 receives the result of adding registers R2 and R1. a. What would the timing be without bypass-signal paths/forwarding (use stalls to solve the data hazard)? (This code might require more or less that 15 cycles) Time d Instructions 1 2 3 4 5 6 7 8 9 10 ADD R3, R2, R1 F D E M W STORE R3, 8(R4) LOAD R4, 16(R3) SUB R3, R4, R1 MUL R6, R3, R4 STORE R6, 4(R5) (Assume that a register cannot be written and the new value read in the same stage.) 11 12 13 14 15 b. What would the timing be with bypass-signal paths? (This code might require more that 15 cycles) Time d Instructions 1 2 3 4 5 6 7 8 9 10 ADD R3, R2, R1 F D E M W STORE R3, 8(R4) LOAD R4, 16(R3) SUB R3, R4, R1 MUL R6, R3, R4 STORE R6, 4(R5) (Assume that a register cannot be written and the new value read in the same stage.) 11 12 13 14 15 c. Draw ALL the bypass-signal paths needed for the above example. Fetch Decode Execute Write F/D D/E E/M M/W Decoder Instr. Register File ALU Result Value Data Result Value Register File

Question 8. Another simple sort is called insertion sort. Recall that in a simple sort: the outer loop keeps track of the dividing line between the sorted and unsorted part with the sorted part growing by one in size each iteration of the outer loop. (below the firstunsortedindex keeps track of the dividing line) the inner loop's job is to do the work to extend the sorted part's size by one. After several iterations of insertion sort s outer loop, an array might look like: Sorted Part Unsorted Part... 0 1 2 3 4 5 6 7 8 length-1 10 20 35 40 45 60 25 50 90 In insertion sort the inner-loop takes the "first unsorted item" (25 at index 6 in the above example) and "inserts" it into the sorted part of the array "at the correct spot." After 25 is inserted into the sorted part, the array would look like: Sorted Part 0 1 2 3 4 5 6 7 8 10 20 25 35 40 45 60 50 90 Unsorted Part... Consider the following insertion sort algorithm that sorts an array numbers: InsertionSort(numbers - address to integer array, length - integer) integer firstunsortedindex, testindex, elementtoinsert; for firstunsortedindex = 1 to (length-1) do testindex = firstunsortedindex-1; elementtoinsert = numbers[firstunsortedindex]; while (testindex >=0) AND (numbers[testindex] > elementtoinsert ) do numbers[ testindex + 1 ] = numbers[ testindex ]; testindex = testindex - 1; end while numbers[ testindex + 1 ] = elementtoinsert; end for end InsertionSort length-1 a) Where in the code would unconditional branches be used and where would conditional branches be used? b) If the compiler could predict by opcode for the conditional branches (i.e., select whether to use machine language statements like: BRANCH_LE_PREDICT_NOT_TAKEN or BRANCH_LE_PREDICT_TAKEN"), then which conditional branches would be "PREDICT_NOT_TAKEN" and which would be "PREDICT_TAKEN"?

c) Assumptions: length = 100 and the numbers are initially in descending order before the insertion sort algorithm is called the five-stage pipeline discussed in class the outcome of conditional branches is known at the end of the E stage target addresses of all branches is known at the end of the D stage ignore any data hazards Under the above assumptions, answer the following questions: i) If fixed predict-never-taken is used by the hardware, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Here assume NO branch-prediction buffer) ii) If a branch target buffer with one history bit per entry is used, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-prediction buffer) Explain your answer. iii) If a branch target buffer with two history bit per entry is used, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-prediction buffer) Explain your answer.