Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100

Similar documents
CSCE 212: FINAL EXAM Spring 2009

ECE 3055: Final Exam

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM

CS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours

Advanced Computer Architecture

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

Department of Computer Science Duke University Ph.D. Qualifying Exam. Computer Architecture 180 minutes

Lecture 24: Virtual Memory, Multiprocessors

Advanced Computer Architecture

Chapter Seven Morgan Kaufmann Publishers

COMP375 Practice Final Exam

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

ECE/CS 757: Homework 1

3. Which of the following is volatile? [ ] A) Bubble memory B) RAM C) ROM D) Magneticdisk

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Final Exam Preparation Questions

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

EE382 Processor Design. Illinois

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

COMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory

BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

Advanced Computer Architecture

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11

ALU(B) delay in cycles Arithmetic 32% 1 2 Data Transfer 36% 2 2 Floating Point 10% 3 4 Control Transfer 22% 2 2

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Designing for Performance. Patrick Happ Raul Feitosa

ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements

ENGR 100 Midterm (CSE Part) Winter 2014

4. Networks. in parallel computers. Advances in Computer Architecture

Third Midterm Exam April 24, 2017 CS162 Operating Systems

Write only as much as necessary. Be brief!

Introduction to Input and Output

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

ECE 485/585 Microprocessor System Design

Third Midterm Exam April 24, 2017 CS162 Operating Systems

Organization of a Surface

Outline. Operating Systems: Devices and I/O p. 1/18

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ECE 411 Exam 1. Name:

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

COMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation

EECS 570 Final Exam - SOLUTIONS Winter 2015

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

High performance computing. Memory

Components of the Virtual Memory System

CS315A Midterm Solutions

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

CSE 378 Final 3/18/10

/ : Computer Architecture and Design Spring Final Exam May 1, Name: ID #:

Fall 2011 PhD Qualifier Exam

CS / ECE 6810 Midterm Exam - Oct 21st 2008

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM

Basic Concepts COE 205. Computer Organization and Assembly Language Dr. Aiman El-Maleh

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

Interconnect Technology and Computational Speed

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

KSU/CCIS CSC227 Tutorial # 6 Memory Management SUMMER

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

MCS-284 Final Exam Serial #:

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

TAG Word 0 Word 1 Word 2 Word 3 0x0A0 D2 55 C7 C8 0x0A0 FC FA AC C7 0x0A0 A5 A6 FF 00

Computer Architecture and Engineering CS152 Quiz #2 March 7th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

( D ) 4. Which is not able to solve the race condition? (A) Test and Set Lock (B) Semaphore (C) Monitor (D) Shared memory

Memory Hierarchy. 2/18/2016 CS 152 Sec6on 5 Colin Schmidt

Name: Computer Science 252 Quiz #2

ECE 752 Adv. Computer Architecture I

Chapter 5. Multiprocessors and Thread-Level Parallelism

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Operating Systems. Operating Systems Professor Sina Meraji U of T

Chapter 5 The Memory System

CS-245 Database System Principles

CSE 120 Principles of Operating Systems

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues

EECS 470 Final Exam Fall 2005

ECE232: Hardware Organization and Design

Midterm Exam March 3, 1999 CS162 Operating Systems

Lecture 12: Instruction Execution and Pipelining. William Gropp

Computing Science 300 Sample Final Examination

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005

ISA Instruction Operation

Name: Instructions. Problem 1 : Short answer. [56 points] CMU Storage Systems 25 Feb 2009 Spring 2009 Exam 1

Main Points of the Computer Organization and System Software Module

I) The Question paper contains 40 multiple choice questions with four choices and student will have

Question Points Score total 100

Portland State University ECE 588/688. Cray-1 and Cray T3E

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

Transcription:

EE382: Processor Design Final Examination March 20, 1998 Please do not open the exam book or begin work on the exam until instructed to do so. You have a total of 3 hours to complete this exam. You will be informed when 3 hours have elapsed. You must stop all work on the exam at that time. You may use your textbook and notes during the exam, as well as a calculator. Show work and report your answers on each sheet. Use the blank sheet at the end of the exam, the back of the page, or attach additional sheets if necessary. Good Luck! Your matriculation at Stanford University indicates that you have read and understood the Honor Code, and you agree to abide by the Code. Your signature here confirms that. Signed: Name (Printed): Stanford ID: Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100 SITN Students: Please attach a routing slip. EE382 Final Exam March 20, 1998 Page 1 of 6

Problem 1: Vector Processors [20 points] A vector processor has three pipelines: one for Load/Store, one for addition, and one for multiplication. The processor s pipelines are clocked at 100 MHz. The function units for addition and multiplication have the same number of stages, and they can be chained together. The memory system consists of 32 modules, each with a cycle time of 100 ns. The vector processor is being evaluated for its performance on a vector inner product calculation: X = Σ(A[i]*B[i]) i 1.1 Assume the vectors and vector register length are sufficiently long that pipeline startup and draining can be ignored. (a) The value of γ opt = [5 points] (b) If the achieved γ is 0.5*γ opt then the achieved MFLOPS = [10 points] 1.2 A benchmark test is run to compute an inner product on vectors that have been preloaded into the register file. The measured performance for a vector of length 36 is 150 MFLOPs. The number of stages in the addition pipeline = [5 points] EE382 Final Exam March 20, 1998 Page 2 of 6

Problem 2: Cache Coherency [20 points] The cache coherence mechanism for a multiprocessor system used a MESI protocol. Consider a system with two processors, P1 and P2, with the initial cache state shown in the following table. For this problem, assume each cache holds only 4 lines and uses direct-mapped organization. P1 Set P2 Line State Line State L1 M 0 L5 I L2 E 1 L6 M L3 S 2 L3 S L4 I 3 L8 E 2.1 What is the state of each cache after the following sequence of memory references is completed? Fill in the table below. [16 points] P2 reads line L1 P1 writes line L2 P2 writes line L3 P1 reads line L8 P1 Set P2 Line State Line State 0 1 2 3 2.2 Assume that the caches are returned again to the original state above. Describe a simple action by P2 that would leave an exclusive copy of line L3 in P1 s cache even though the cache state would be S. [4 points] EE382 Final Exam March 20, 1998 Page 3 of 6

Problem 3: Scalable Multiprocessor Interconnection Networks [20 points] A scalable multiprocessor system uses the direct, static configuration of a (16,3) hypertorus. The message payload is 120 bits and the channel width is 12. Wormhole routing is used. Links are bidirectional. For an application under evaluation the rate of message generation is 0.01 3.1 Assume uniform distribution of inter-node messages. (a) The average number of hops to transmit a message = [2 points] (b) The average channel utilization = [2 points] (c) The mean message communication latency in cycles = [6 points] 3.2 Now assume the application can be partitioned and scheduled to achieve a locality factor of 0.125. (a) The average number of hops to transmit a message = [2 points] (b) The average channel utilization = [2 points] (c) The mean message communication latency in cycles = [6 points] EE382 Final Exam March 20, 1998 Page 4 of 6

Problem 4: Multiprogramming System Models [20 points] A system with a single processor and a single disk is multiprogrammed with 3 jobs. The processor executes an average of 12.5 ms between disk accesses, 10 ms for the application and 2.5 ms for the operating system to handle the disk operation. The disk service time has a mean value of 15 ms with c 2 = 1.0. 4.1 Which of the system models described in section 9.4 applies and why? [2 points] 4.1(a) The achieved rate for disk accesses per second = [6 points] 4.1(b) The percent of time that the processor executes application code = [2 points] 4.2 Now we remove one job from the multiprogramming mix and use its memory to implement a disk cache. Assume that the hit rate for the disk cache is 50%, which enables the processor to execute twice as long between disk accesses. That is, the processor executes 20 ms for the application and 5 ms for the operating system between disk accesses. Which of the system models described in section 9.4 applies and why? [2 points] 4.2(a) The achieved rate for disk accesses per second = [6 points] 4.2(b) The percent of time that the processor executes application code = [2 points] EE382 Final Exam March 20, 1998 Page 5 of 6

Problem 5: Concurrent Disk Models [20 points] An array of 8 disks is organized in a (4,2) configuration. The key disk parameters are seek time of 10 ms, rotational speed of 6000 RPM, sector size of 512B, and 100 sectors per track. You can assume c 2 =0.5 for disk service time. You can ignore transfer time for this problem. The file system has a block size of 4KB (4096 bytes). File sizes are distributed with 25% of length 1 block, 25% of length 2 blocks, and 50% of length 16 blocks. 5.1 The time for the (4,2) disk configuration to read a file of 8 blocks = [4 points] 5.2 The expected number of blocks per file E(f) = [2 points] 5.3 The number of independent disk servers for a file access m q = [2 points] Note: Assume this is the effective number of independent servers below. 5.4 The expected number of blocks read or written per file access E(f q) = [2 points] 5.5 The average service time per file access = [2 points] 5.6 If we have an MP system with 10 requestors that execute on each processor for an average of 100 ms between requests, then the achieved rate of file requests for the system = [8 points] EE382 Final Exam March 20, 1998 Page 6 of 6