Midterm #2 Solutions April 23, 1997

Similar documents
Chapter 4. The Processor

Computer Architecture CS372 Exam 3

Chapter 4. The Processor

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

University of California, Berkeley College of Engineering

Chapter 4. The Processor Designing the datapath

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

ECE331: Hardware Organization and Design

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Write only as much as necessary. Be brief!

Faculty of Science FINAL EXAMINATION

Computer Architecture Spring 2016

Memory. Lecture 22 CS301

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Memory Hierarchies 2009 DAT105

Final Exam Fall 2007

COMPUTER ORGANIZATION AND DESIGN

Chapter 4. The Processor

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Chapter 4. The Processor. Computer Architecture and IC Design Lab

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE550 PRACTICE Final

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Page 1. Memory Hierarchies (Part 2)

Prerequisite Quiz January 23, 2007 CS252 Computer Architecture and Engineering

ECE Sample Final Examination

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN1640: Design of Computing Systems Topic 06: Memory System

14:332:331. Week 13 Basics of Cache

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CS3350B Computer Architecture Winter 2015

Computer Architecture V Fall Practice Exam Questions

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

14:332:331. Week 13 Basics of Cache

CSE 378 Final Exam 3/14/11 Sample Solution

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

LECTURE 10: Improving Memory Access: Direct and Spatial caches

2 GHz = 500 picosec frequency. Vars declared outside of main() are in static. 2 # oset bits = block size Put starting arrow in FSM diagrams

CPE 631 Lecture 04: CPU Caches

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

CS 351 Exam 2 Mon. 11/2/2015

Advanced Memory Organizations

Cache Organizations for Multi-cores

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

ISA Instruction Operation

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Prerequisite Quiz September 3, 2003 CS252 Computer Architecture and Engineering

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

CS 251, Winter 2018, Assignment % of course mark

ECE 30 Introduction to Computer Engineering

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

CS 251, Winter 2019, Assignment % of course mark

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

HW1 Solutions. Type Old Mix New Mix Cost CPI

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

CS61C Fall 2013 Final Instructions

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 4 The Processor 1. Chapter 4A. The Processor

Translation Buffers (TLB s)

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

EE 4683/5683: COMPUTER ARCHITECTURE

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Virtual Memory Review. Page faults. Paging system summary (so far)

Introduction. Datapath Basics

Memory Hierarchy Review

Pipelined processors and Hazards

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

Processor Architecture V! Wrap-Up!

Chapter Seven Morgan Kaufmann Publishers

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Memory Hierarchy. Mehran Rezaei

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

1. Creates the illusion of an address space much larger than the physical memory

Transcription:

CS152 Computer Architecture and Engineering Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Sp97 D.K. Jeong Midterm #2 Solutions April 23, 1997 Name Key SID Number You may bring one double-sided note. You have 170 minutes. Write your name on this cover and also at the top left of each page. The point value of each question is indicated in the bracket. Make sure to show your interim results to receive at least partial credit. Problem Possible Score 1 15 2 25 3 20 4 30 Total 90

Question 1. [15 pts] (General) (a) You are going to add a few new instructions to an existing ISA. How will you make old machines be able to run new codes? Modify the exception handler in the old machine so that the handler, on unimplemented instruction exceptions on new instructions, emulates the behavior of the new instructions. (b) Explain why a direct-mapped TLB is a bad idea. Since one TLB entry is allocated for one page, unlike a cache where a cache entry is in blocks, locality of reference is expected to be a lot less. Furthermore, for small programs, the TLB index tends to be concentrated in all 0 s, differing only in address tags - MSB s are used for segments such as data, heap, stack Ping-pong effect will be significant. (c) What are the particular characteristics of multimedia IOs compared to conventional IOs such as disk or mouse? High bandwidth is required for video. Isochronous data transfer or guaranteed bandwidth is required for real-time applications. Synchronization among different IOs such as video and audio may be required. Conventional IO s are less bandwidth intensive and their delay tends to be dominated by the overhead (or setup time) of the devices.

Question 2. [25 pts] (Cache) The following table shows the address trace. Assume that address and data are 8-bit wide. Cache size is 16 bytes and block size is 2 bytes. Write policy is copy-back and LRU algorithm is used for replacement. Assume write allocate policy on write miss. Assume that all cache entries are initially invalid and its contents are Xs except Valid bits. (a) Check if each access is a hit or miss. (b) Draw the content of the cache when all the address trace is consumed. Assume that the content of the main memory is its address - e.g. address 0xC7 contains 0xC7. LRU is not shown but you have to keep it mentally or on a paper. Access Address Data hit/miss? Type Read 00000000 Miss Write 00000001 0xFF Hit Read 00001000 Miss Write 01000100 0xEE Miss Read 00101100 Miss Read 10100011 Miss Read 00000000 Hit Way 0 Way 1 V D Add. tag Data1 Data0 V D Add. tag Data1 Data0 1 1 000000 0xFF 0x00 1 0 000010 0x09 0x08 1 0 101000 0xA3 0xA2 0 x x x x Way 2 Way 3 V D Add. tag Data1 Data0 V D Add. tag Data1 Data0 1 1 010001 0x45 0xEE 1 0 001011 0x2D 0x2C 0 x x x 0 x x x x Address breaks down like this: Tag Index Block offset 6 bits 1 bit 1 bit

Question 3. [20 pts] (System Performance) A pipelined processor with a 100MHz clock is connected to a 50MHz bus through a 4Kbyte instruction and 8Kbyte data cache. Their block sizes are 16 bytes and 32 bytes, respectively. Do not assume any extra cycles for dirty block copy-back for a cache fill since a write buffer will handle it later. Main memory access time is 200ns for the first 32-bit data and the subsequent 32-bit data in a page mode access takes 50ns. Assume instruction cache and data cache responds in the same cycle when hit. Assume miss ratio is 6% for the instruction cache and 4% for the data cache. (a) What is the system s MIPS? CPI=CPI_base + miss_cycles/instruction Assume CPI_base = 1.0 Miss_cycles/instr = Instr_MissRate*Instr_MissPen + %DataRef *Data_MissRate *Data_MissPen Miss Penalty is 200 ns to fetch first 32 bits, and 50 ns to fetch each additional 32 bits. However, bus cycle = 1/50 MHz = 20 ns, so page mode access cannot take 2.5 bus cycles it must take 3 cycles or 60 ns. Instr_MissPen = (200 ns + 3*60 ns )* 1 cycle / 10 ns = 38 cycles Data_MissPen = (200 ns + 7*60 ns) * 1 cycle / 10 ns = 62 cycles Miss_cycles/instr = (.06)*38 ns + (.25)*(.04)*62 ns = 2.90 CPI = 1.0 + 2.90 = 3.90 MIPS = clk rate / CPI = 100 MHz / 3.90 = 25.6410 MIPS (b) Assume that the main memory spends 5 bus cycles every 16ms on refreshing. During that time, memory access is delayed until the refresh cycle finishes. What is the new MIPS with refreshing? Refresh will only hurt performance if the CPU needs to access memory at the same time a refresh operation is occurring. Since refresh takes 5 bus cycles = 100 ns, the probability of a conflict is 100 ns / 16 ms = 6.25 x 10-6. However, we won t always have to wait 5 bus cycles on a conflict. The average number of bus cycles you need to wait is (1+2+3+4+5)/5 = 3.0 bus cycles = 6 processor cycles. So the average penalty per memory access increases by 6*6.25 x 10-6 = 3.75x10-5. CPI_refresh = 1.0 + 2.9*(1+3. 75x10-5 ) = 3.90010875 MIPS_refresh = 100 MHz / CPI_refresh = 25.640311 MIPS (c) An IO controller for a mouse is attached to the bus. Assuming that 200 instructions are executed for handling an IO interrupt with the same cache miss rate, what is the new MIPS (not counting instructions in the interrupt handler) when 30 interrupts are generated per second?

# of instructions / sec for I/O = 30 interrupts/sec * 200 Instr/interrupt = 6000 instr/sec The processor is computing at the same speed as before, but 6000 instr/sec are not part of the program execution, and so need to be deducted from the MIPS. MIPS_mouse = 25.6410 MIPS.006 MIPS = 25.635 MIPS (d) The block sizes of the caches are doubled to reduce the miss rate. The resulting cache miss rate is 5% for I cache and 3% for D cache. Will the performance be increased compared to (a), why? Although the miss rate is reduced, the miss penalty almost doubles, since the caches size has doubled. Instr_MissPen = (200 ns + 7*60 ns) * 1 cycle / 10 ns = 62 cycles Data_MissPen = (200 ns + 15*60 ns) * 1 cycle / 10 ns = 110 cycles Miss_cycles / Instr = (.05)*62 cycles + (.25)*(.03)*110 cycles =3.925 miss_cycles/instr CPI = 1.0 + 3.925 = 4.925 MIPS_double = 20.30 MIPS, which is almost 80% slower.

Question 4. [30 pts] (Pipelined CPU Design) You are to design a pipelined 16-bit microprocessor with only 5 instructions. Use a 5-stage pipeline and assume delayed branch with one delay slot. The register file contains 4 data words. Instruction Format Description add 0000 Rs Rt Rd XXXXXX R[Rd] R[Rs] + R[Rt] load 0010 Rs Rt Displacement R[Rt] M[R[Rs] + Displacement]] store 0011 Rs Rt Displacement M[R[Rs] + Displacement]] R[Rt] beq 0100 Rs Rt Displacement branch to PC+2+displacement*2 if R[Rs] = R[Rt] jmp 1 target unconditional jump to target*2 (a) Draw a pipelined datapath, define control signals, and design a controller in the gate level. Make sure to indicate the width of buses. Minimize logic as much as possible. Assume that branch or jump instructions cannot be in the delay slot. Use only the following gates/blocks: NOT, AND, OR, XOR, MUX, Full Adder, Register File, and Memory.

(b) Add a hazard detection logic and an internal forwarding mechanism to handle RAW hazards. (It must be able to handle alu-alu, load-alu, and load-store instruction sequences correctly.) (end of Midterm #2 solutions)