CPE 631 Advanced Computer Systems Architecture: Homework #2

Similar documents
ECE 30 Introduction to Computer Engineering

ISA Instruction Operation

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

ECE 341 Final Exam Solution

ECE 2300 Digital Logic & Computer Organization. More Caches Measuring Performance

data block 0, word 0 block 0, word 1 block 1, word 0 block 1, word 1 block 2, word 0 block 2, word 1 block 3, word 0 block 3, word 1 Word index cache

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

Computer Architecture CS372 Exam 3

Final Exam Fall 2007

Pipelined processors and Hazards

ECE 30, Lab #8 Spring 2014

ECE331: Hardware Organization and Design

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark

ECE 3055: Final Exam

Problem 1 (logic design)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

NAME: Problem Points Score. 7 (bonus) 15. Total

1. Creates the illusion of an address space much larger than the physical memory

5008: Computer Architecture HW#2

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

ETH, Design of Digital Circuits, SS17 Practice Exercises II - Solutions

Q1: Finite State Machine (8 points)

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

CENG 5133 Computer Architecture Design Spring Sample Exam 2

Virtual Memory: From Address Translation to Demand Paging

CS 2410 Mid term (fall 2018)

Solutions to exercises on Instruction Level Parallelism

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Computer Architecture EE 4720 Practice Final Examination

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

Memory management units

ECE331: Hardware Organization and Design

This exam is worth 60 points, or 30% of your total course grade. The exam contains eight

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

200 points total. Start early! Update March 27: Problem 2 updated, Problem 8 is now a study problem.

CS 351 Final Review Quiz

Computer Architecture EE 4720 Final Examination

ECE331: Hardware Organization and Design

COSC Operating Systems Design, Fall 2001, Byunggu Yu. Chapter 9 Memory Management (Lecture Note #8) 1. Background

Purpose This course provides an overview of the SH-2A 32-bit RISC CPU core built into newer microcontrollers in the popular SH-2 series

Please state clearly any assumptions you make in solving the following problems.

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

CSCE 212: FINAL EXAM Spring 2009

Hardware-Based Speculation

CS152 Computer Architecture and Engineering

Final Exam Fall 2008

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

ECE232: Hardware Organization and Design

ECE 411 Exam 1 Practice Problems

ECE 331 Hardware Organization and Design. UMass ECE Discussion 10 4/5/2018

CPE 631 Lecture 04: CPU Caches

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

6.823 Computer System Architecture Datapath for DLX Problem Set #2

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

Virtual Memory: From Address Translation to Demand Paging

Advanced Memory Organizations

V. Primary & Secondary Memory!

ICS 51: Introduction to Computer Organization

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

Pipelining. CSC Friday, November 6, 2015

ECE 2300 Digital Logic & Computer Organization. More Caches

ECEN/CSCI 4593 Computer Organization and Design Exam-2

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review

CS 433 Homework 4. Assigned on 10/17/2017 Due in class on 11/7/ Please write your name and NetID clearly on the first page.

Computer Architecture EE 4720 Final Examination

Administration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering

Chapter 4. The Processor

Computer Architecture, EIT090 exam

2) (5 pts.) What is the free list of physical register after the registers of the last instruction are renamed? P10, P12. (P13 can be counted or not).

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Prerequisite Quiz September 3, 2003 CS252 Computer Architecture and Engineering

Aleksandar Milenkovich 1

CS152 Exam #2 Fall Professor Dave Patterson

Computer Architecture Spring 2016

Hardware-Based Speculation

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

Cache Memory Mapping Techniques. Continue to read pp

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

EECS 213 Introduction to Computer Systems Dinda, Spring Homework 3. Memory and Cache

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

CS420/520 Homework Assignment: Pipelining

ECE Sample Final Examination

ECE 486/586. Computer Architecture. Lecture # 7

PowerPC 740 and 750

ECE232: Hardware Organization and Design

ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

CMSC411 Fall 2013 Midterm 1

Administrivia. CMSC 411 Computer Systems Architecture Lecture 8 Basic Pipelining, cont., & Memory Hierarchy. SPEC92 benchmarks

Transcription:

CPE 631 Advanced Computer Systems Architecture: Homework #2 Issued: 02/01/2006 Due: 02/15/2006 Q#1. (30 points) Evaluate effectiveness of blocking optimization for matrix multiplication on SRx machines. A. Write a C subroutine for matrix multiplication MC = MA x MB: void mm(double **ma, double **mb, double **mc, int d); Matrices ma, mb, and mc are squared with dxd elements of type double. Let d be an input parameter for your main program that initializes the matrices and calls the subroutine for matrix multiplication. B. Modify your program from the part A to support the blocking optimization technique. Let b (blocking factor) be an additional input parameter of your program. C. Compare performance of the base program from A and the program from B for different blocking factors b assuming n=128 (or 256)? What is the optimal b for the solution with blocking? Q#2. (30 points) Evaluate effectiveness of blocking optimization for matrix multiplication executing on a simulated machine. Using the SimpleScalar toolset for PISA instruction set (sim-cache and sim-outorder simulators), repeat the measurement form Q#1 for a simulated computer system with the following characteristics L1I (8KB, direct-mapped, 64B cache line) + L1D (8KB, directmapped, 64B cache line), L2U (256KB, 64B cache line, LRU replacement policy, 4-way setassociative). UAH-ECE CPE 631: Homework #2 Page 1 of 6

Q#3. (10 points) Cache basics A. (3 points) Assume a computer with the following characteristics: word is 32 bits, addressable unit is a byte, 2-way set-associative data cache with 4 word blocks, the cache size is 512B. Replacement policy is LRU, the write policy is write-back, and on write miss the block is loaded into the cache (write allocate). Determine the sizes of the Tag, Index and Offset fields. Draw the structure of cache memory (tags + data, status bits, LRU bits). B. (7 points) Fill the following table for the cache memory described above. Assume that the cache memory is empty at the beginning, and each memory location contains its own address (e.g., Mem[0x0000 000C]=0x0000 000C). All write actions write 0 to the specified locations. CPU action Hit/Miss Replacement [-/Yes(Which block)] Read 0x02 Read 0x40 Write 0x41 Write 0x1042 Write 0x1243 Memory Operation [Read/Write] + Block Address Draw the structure of cache memory after the last CPU action is done. Q#4. (10 points) Textbook A#2. Q#5. (5 points) Textbook A#3. UAH-ECE CPE 631: Homework #2 Page 2 of 6

Q#6. (15 points) MIPS Pipeline Consider the following code fragment assuming the MIPS integer pipeline where branches are resolved during the Instruction Decode Stage. All memory accesses are cache hits. The initial value of R3 is R2 + 400. Branches are handled by freezing the pipeline. loop: lw R1, 0(R2) A. (5 points) Show the timing of this instruction sequence for the MIPS pipeline without any forwarding hardware. How many clock cycles does this loop take to execute? B. (5 points) Show the timing of this instruction sequence for the MIPS pipeline with forwarding hardware. Mark all data forwarding paths using arrows directed from source to destination. How many clock cycles does this loop take to execute? C. (5 points) Assuming MIPS pipeline with delayed branch and forwarding hardware, schedule instructions in the loop including branch delay slot. You may reorder instructions and modify individual instruction operands, but do not undertake other loop transformations. Show a pipeline diagram and compute the number of cycles needed to execute the entire loop. Mark all data forwarding paths using arrows directed from source to destination. Solution: UAH-ECE CPE 631: Homework #2 Page 3 of 6

A. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1st lw R1, 0(R2) 2nd lw R1, 0(R2) D Execution time: UAH-ECE CPE 631: Homework #2 Page 4 of 6

B. 1st lw R1, 0(R2) 2nd lw R1, 0(R2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Execution time: UAH-ECE CPE 631: Homework #2 Page 5 of 6

C. 1 st 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 2 nd Execution time: UAH-ECE CPE 631: Homework #2 Page 6 of 6