EE382: Processor Design Final Examination March 20, 1998 Please do not open the exam book or begin work on the exam until instructed to do so. You have a total of 3 hours to complete this exam. You will be informed when 3 hours have elapsed. You must stop all work on the exam at that time. You may use your textbook and notes during the exam, as well as a calculator. Show work and report your answers on each sheet. Use the blank sheet at the end of the exam, the back of the page, or attach additional sheets if necessary. Good Luck! Your matriculation at Stanford University indicates that you have read and understood the Honor Code, and you agree to abide by the Code. Your signature here confirms that. Signed: Name (Printed): Stanford ID: Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100 SITN Students: Please attach a routing slip. EE382 Final Exam March 20, 1998 Page 1 of 6
Problem 1: Vector Processors [20 points] A vector processor has three pipelines: one for Load/Store, one for addition, and one for multiplication. The processor s pipelines are clocked at 100 MHz. The function units for addition and multiplication have the same number of stages, and they can be chained together. The memory system consists of 32 modules, each with a cycle time of 100 ns. The vector processor is being evaluated for its performance on a vector inner product calculation: X = Σ(A[i]*B[i]) i 1.1 Assume the vectors and vector register length are sufficiently long that pipeline startup and draining can be ignored. (a) The value of γ opt = [5 points] (b) If the achieved γ is 0.5*γ opt then the achieved MFLOPS = [10 points] 1.2 A benchmark test is run to compute an inner product on vectors that have been preloaded into the register file. The measured performance for a vector of length 36 is 150 MFLOPs. The number of stages in the addition pipeline = [5 points] EE382 Final Exam March 20, 1998 Page 2 of 6
Problem 2: Cache Coherency [20 points] The cache coherence mechanism for a multiprocessor system used a MESI protocol. Consider a system with two processors, P1 and P2, with the initial cache state shown in the following table. For this problem, assume each cache holds only 4 lines and uses direct-mapped organization. P1 Set P2 Line State Line State L1 M 0 L5 I L2 E 1 L6 M L3 S 2 L3 S L4 I 3 L8 E 2.1 What is the state of each cache after the following sequence of memory references is completed? Fill in the table below. [16 points] P2 reads line L1 P1 writes line L2 P2 writes line L3 P1 reads line L8 P1 Set P2 Line State Line State 0 1 2 3 2.2 Assume that the caches are returned again to the original state above. Describe a simple action by P2 that would leave an exclusive copy of line L3 in P1 s cache even though the cache state would be S. [4 points] EE382 Final Exam March 20, 1998 Page 3 of 6
Problem 3: Scalable Multiprocessor Interconnection Networks [20 points] A scalable multiprocessor system uses the direct, static configuration of a (16,3) hypertorus. The message payload is 120 bits and the channel width is 12. Wormhole routing is used. Links are bidirectional. For an application under evaluation the rate of message generation is 0.01 3.1 Assume uniform distribution of inter-node messages. (a) The average number of hops to transmit a message = [2 points] (b) The average channel utilization = [2 points] (c) The mean message communication latency in cycles = [6 points] 3.2 Now assume the application can be partitioned and scheduled to achieve a locality factor of 0.125. (a) The average number of hops to transmit a message = [2 points] (b) The average channel utilization = [2 points] (c) The mean message communication latency in cycles = [6 points] EE382 Final Exam March 20, 1998 Page 4 of 6
Problem 4: Multiprogramming System Models [20 points] A system with a single processor and a single disk is multiprogrammed with 3 jobs. The processor executes an average of 12.5 ms between disk accesses, 10 ms for the application and 2.5 ms for the operating system to handle the disk operation. The disk service time has a mean value of 15 ms with c 2 = 1.0. 4.1 Which of the system models described in section 9.4 applies and why? [2 points] 4.1(a) The achieved rate for disk accesses per second = [6 points] 4.1(b) The percent of time that the processor executes application code = [2 points] 4.2 Now we remove one job from the multiprogramming mix and use its memory to implement a disk cache. Assume that the hit rate for the disk cache is 50%, which enables the processor to execute twice as long between disk accesses. That is, the processor executes 20 ms for the application and 5 ms for the operating system between disk accesses. Which of the system models described in section 9.4 applies and why? [2 points] 4.2(a) The achieved rate for disk accesses per second = [6 points] 4.2(b) The percent of time that the processor executes application code = [2 points] EE382 Final Exam March 20, 1998 Page 5 of 6
Problem 5: Concurrent Disk Models [20 points] An array of 8 disks is organized in a (4,2) configuration. The key disk parameters are seek time of 10 ms, rotational speed of 6000 RPM, sector size of 512B, and 100 sectors per track. You can assume c 2 =0.5 for disk service time. You can ignore transfer time for this problem. The file system has a block size of 4KB (4096 bytes). File sizes are distributed with 25% of length 1 block, 25% of length 2 blocks, and 50% of length 16 blocks. 5.1 The time for the (4,2) disk configuration to read a file of 8 blocks = [4 points] 5.2 The expected number of blocks per file E(f) = [2 points] 5.3 The number of independent disk servers for a file access m q = [2 points] Note: Assume this is the effective number of independent servers below. 5.4 The expected number of blocks read or written per file access E(f q) = [2 points] 5.5 The average service time per file access = [2 points] 5.6 If we have an MP system with 10 requestors that execute on each processor for an average of 100 ms between requests, then the achieved rate of file requests for the system = [8 points] EE382 Final Exam March 20, 1998 Page 6 of 6