Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm

University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: S. A. Norman (L01), N. R. Bartley (L02) Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm NAME (printed): Please don t write anything within this box. 1 / 9 U of C ID NUMBER: 2 / 7 3 / 12 4 / 10 LECTURE SECTION (L01 was TuTh at 9:30am, L02 was TuTh at 12:30pm): 5 / 12 6 / 6 7 / 15 8 / 11 SIGNATURE: 9 / 13 10 / 15 TOTAL / 110 Instructions Please note that the official University of Calgary examination regulations are printed on page 1 of the Examination Regulations and Reference Material booklet that accompanies this examination paper. All of those regulations are in effect for this examination, except that you must write your answers on the question paper, not in the examination booklet. You may not use electronic calculators or computers during the examination. The examination is closed-book. You may not refer to books or notes during the examination, with one exception: you may refer to the Examination Regulations and Reference Material booklet that accompanies this examination paper. You are not required to add comments to assembly language code you write, but you are strongly encouraged to do so, because writing good comments will improve the probability that your code is correct and will help you to check your code after it is finished. Some problems are relatively easy and some are relatively difficult. Go after the easy marks first. Write all answers on the question paper and hand in the question paper when you are done. Please do not hand in the Examination Regulations and Reference Material booklet. Please print or write your answers legibly. What cannot be read cannot be marked. If you write anything you do not want marked, put a large X through it and write rough work beside it. You may use the backs of pages for rough work.

ENCM 369 Winter 2009 Final Examination page 2 of 10 PROBLEM 1 (total of 9 marks) Part a. (4 marks.) Suppose that before the instructions to the right are run, $t0 = 0xffff_fffe, $t1 = 0x0000_0001, $t2 = 17 (base ten), and $t3 = 3 (base ten). What will be the contents of $t4-$t7 after the instructions run? mult $t0, $t1 mfhi $t4 multu $t0, $t1 mfhi $t5 div $t2, $t3 mflo $t6 mfhi $t7 $t4 $t5 $t6 $t7 Part b. (5 marks.) Write a SPIM translation of the C function quux listed to the right of this text. Use only instructions from the Final Examination Instruction Subset described in the Examination Regulations and Reference Material booklet. Assume that an int multiplication result is taken as the least-significant 32 bits of a 64-bit product. Follow the calling conventions used in lectures and labs. int quux(int j, int k) { return j * (k % 251); }

ENCM 369 Winter 2009 Final Examination page 3 of 10 PROBLEM 2 (7 marks). Complete the SPIM translation of the C function myfunc, using only instructions from the Final Examination Instruction Subset described in the Examination Regulations and Reference Material booklet. Follow the calling conventions used in lectures and labs, and observe the following additional conventions regarding floating-point registers: myfunc returns a value in the double-precision register $f0; arguments x, a, and b arrive in double-precision registers $f12, $f14, and $f16, in that order; double-precision registers $f2, $f4,..., $f10 may be used like $t0 $t9; double myfunc(double x, double a, double b) { double r; r = a * x + b; if (r > 10.0) r = 10.0; else if (r < -10.0) r = -10.0; return r; } double-precision registers $f20, $f22,..., $f30 may be used like $s0 $s7..data c10pt0:.double 10.0.text.globl myfunc myfunc:

ENCM 369 Winter 2009 Final Examination page 4 of 10 PROBLEM 3 (total of 12 marks) Part a. (2 marks.) Suppose $a0 contains 0x9000_0000 and $a1 contains 0x2000_0000. What will be the value in $t0 after subu $t0, $a0, $a1 is run? Express your answer in base sixteen. Part b. (2 marks.) Did overflow occur in the subtraction of part a? Did wraparound occur? Give reasons to support both of your answers. Part c. (3 marks.) In the IEEE 754 single-precision format, what number does the following bit pattern represent? 0xc1dc_0000 Show the intermediate steps used to obtain your answer. Part d. (3 marks.) What is the IEEE 754 double-precision representation of 8.75? Show intermediate steps, and express your answer in base sixteen. Part e. (2 marks.) An ENCM 339 student writes the code to the right as part of a small C program to test his understanding of if/else statements, and is surprised to find that the output from his if/else statement is NOT EQUAL double d = 0.3; if (3.0 * d == 0.9) printf("equal\n"); else printf("not EQUAL\n"); Explain carefully (but without actually working out bit patterns for 0.3, 3.0, or 0.9) how the expression involving == could be false.

ENCM 369 Winter 2009 Final Examination page 5 of 10 PROBLEM 4 (10 marks). Write a SPIM translation of the C function foo, using only instructions from the Final Examination Instruction Subset described in the Examination Regulations and Reference Material booklet. Follow the calling conventions used in lectures and labs, and observe the following additional conventions regarding floating-point registers: bar returns a value in the single-precision register $f0; double-precision registers $f2, $f4,..., $f10 may be used like $t0 $t9; double-precision registers $f20, $f22,..., $f30 may be used like $s0 $s7. Remember also that double-precision precision registers share storage with single-precision registers. For example, the 64-bit $f2 shares storage with the 32-bit $f2 and the 32-bit $f3. float bar(float a); void foo(double *z, float *x, float *y, int n) { double *p; p = z + n; while (z!= p) { *z = bar(*x) * bar(*y); z++; x++; y++; } }

ENCM 369 Winter 2009 Final Examination page 6 of 10 PROBLEM 5 (12 marks) Consider the single-cycle processor of Figure 4.17 from your textbook, in which each instruction is fetched and completed within a single processor clock cycle. A copy of that figure can be found on page 6 of the Reference Material booklet. Note that the ALUOp signal works as follows: 00 requests addition, 01 requests subtraction, and 10 tells the ALU Control Unit to decide based on bits 5 0 of the instruction. Suppose the following sequence of instructions is run: lw $t0, 0($sp) add $t1, $t0, $s0 sw $t1, 12($sp) beq $t0, $s1, L42 Note that in both the add instruction and the beq instruction, $t0 is encoded as bits 25 21 of the instruction. Suppose also that before the lw instruction starts, $sp = 0x7fff_fe40, $s0 = 0x1000_0000, $s1 = 0x0000_2468, and the memory word at address 0x7fff_fe40 has a value of 0x0000_2482. Fill in the following table to show what values are attained by control signals, the Read data 1 output of the Register File, and the ALU Result signal, in cycles 2, 3 and 4. Use an X to indicate that a control signal is a don t care in a particular clock cycle. cycle RegDst Branch MemRead MemtoReg ALUOp instruction Read data 1 ALU Result 1 lw $t0, 0($sp) 0 0 1 1 00 0 1 1 0x7fff_fe40 0x7fff_fe40 MemWrite ALUSrc RegWrite 2 add $t1, $t0, $s0 3 sw $t1, 12($sp) 4 beq $t0, $s1, L42 PROBLEM 6 (total of 6 marks) Part a. (2 marks.) The fragment of C code below counts how many characters in a string match a character code in c1. Beside the C code are two different MIPS assembly language translations of the loop, for a real MIPS processor that has delayed branch instructions. do { c2 = *p; p++; if (c1 == c2) count++; } while (c2!= \0 ); L1: lbu $t9, ($a0) bne $a1, $t9, L2 addiu $a0, $a0, 1 addiu $t8, $t8, 1 L2: bne $t9, $zero, L1 nop L3: lbu $t9, ($a0) addiu $a0, $a0, 1 subu $t0, $t9, $a1 sltiu $t1, $t0, 1 bne $t9, $zero, L3 addu $t8, $t8, $t1 Write register names in the boxes below to show the correspondences between C variables or arguments and MIPS registers. (The correspondences are the same in the two different fragments of assembly language.) c1: c2: count: p: Part b. (2 marks.) The use of subu and sltiu in part a to make $t1 equal to 1 if c1 == c2 and equal to 0 otherwise is a somewhat non-obvious trick. Explain why the combination of subu and sltiu does the right thing in the code above. Part c. (2 marks.) Each of the assembly language loops in part a has six instructions. Explain why the loop that starts with L3 is likely to run faster than the loop that starts with L1, in a pipelined processor that correctly implements the MIPS instruction set.

ENCM 369 Winter 2009 Final Examination page 7 of 10 PROBLEM 7 (total of 15 marks). This problem concerns the pipelined computer of Figure 4.51 from your textbook. There is a copy of this figure on page 7 of the Reference Material booklet. Suppose that the clock period for this processor is 1.0 ns, and that clock edges that cause updates to the PC and the pipeline registers occur at t = 0.0ns, t = 1.0ns, t = 2.0ns, and so on. Suppose also that clock edges that cause updates to the Register File occur at t = 0.5ns, t = 1.5ns, and so on. Parts a f are concerned with the following sequence of instructions: address instruction disassembly 0x0040_0024 0x0000_0000 nop 0x0040_0028 0x0000_0000 nop 0x0040_002c 0x0000_0000 nop 0x0040_0030 0x8fa8_0008 lw $8, 8($29) 0x0040_0034 0x0109_5025 or $10, $8, $9 # Instruction format note: bits 25-21 encode $8 0x0040_0038 0x0000_0000 nop 0x0040_003c 0x0000_0000 nop 0x0040_0040 0x0000_0000 nop Suppose that at t = 10.0ns the IF phase of the lw instruction begins, that at that time, GPR contents are as follows... $8: 0x1234_0000 $9: 0x0001_2345 $10: 0x0101_0101 $29: 0x7fff_0100...and that the value of the Data Memory word at address 0x7fff_0008 is 0xfff0_0000. For each of your answers to parts b f, give a brief explanation of your answer. Use the answer to part a as a model. Part a. (0 marks.) What bit pattern is written into IF/ID.Instruction at t = 11.0ns? 0x8fa8_0008 this is the bit pattern for the lw instruction, copied out of the Instruction Memory. Part b. (2 marks.) What bit patterns are written into ID/EX.A and ID/EX.B at t = 12.0ns? Part c. (2 marks.) What bit patterns are written into ID/EX.A and ID/EX.B at t = 13.0ns? Part d. (2 marks.) What bit pattern is written into ID/EX.RegisterRt at t = 13.0ns? Part e. (2 marks.) What bit pattern is written into EX/MEM.Result at t = 13.0ns? Part f. (2 marks.) At what time does the update to $10 in the Register File occur, and what value is written to $10? Part g. (3 marks.) The instruction sequence used for parts a f has a kind of pipeline hazard. What kind of hazard is it? Does the computer of Figure 4.51 manage this hazard correctly? Briefly give a reason for your answer. Part h. (2 marks.) This part is about the computer of Figure 4.51, but it is not related to the instruction sequence used in parts a g. The outputs of the Main Control Unit are all written to the ID/EX pipeline register. It would be incorrect to send these outputs directly to functional units such as the Register File and the Data Memory. Using the MemWrite signal as an example, explain why it would be wrong to send the Main Control Unit outputs directly to functional units.

ENCM 369 Winter 2009 Final Examination page 8 of 10 PROBLEM 8 (total of 11 marks). Parts a e concern a D-cache (data cache) for a computer with 32-bit words. The cache is 2-way set-associative with 2-word blocks. There is no virtual memory, so the processor core sends physical data addresses directly to the D-cache. Cache indexes are 9 bits wide and cache tags are 20 bits wide. Each line of the cache has an lru bit, which is 0 if the entry to the left of the lru bit has been accessed more recently than the entry to the right of the lru bit and is 1 otherwise. Part a. (2 marks.) Suppose a load-word instruction is attempted using 0x1122_396c as a data address. The index from this address will be 100101011 two. Suppose that just before the load is attempted, line 100101011 two of the D-cache looks like this: data for data for data for data for V tag block offset 0 block offset 1 lru V tag block offset 0 block offset 1 1 0x11220 0x0000 0xffff 1 1 0x11227 0x1122 0x1122 _0001 _ffff _3400 _341c The attempt read data from the cache will be a miss. Explain exactly how the miss is detected. Part b. (3 marks.) Suppose that when the load of part a is attempted, some of the words in main memory are as follows: address data word 0x1122_3964 0x3333_3333 0x1122_3968 0x4444_4444 0x1122_396c 0x5555_5555 0x1122_3970 0x6666_6666 0x1122_3974 0x7777_7777 Assuming that the cache uses an LRU replacement strategy, fill in the table below to show the contents of line 100101011 two just after the D-cache has finished taking care of the miss. data for data for data for data for V tag block offset 0 block offset 1 lru V tag block offset 0 block offset 1 Part c. (2 marks.) Now suppose a store-word instruction is attempted, using 0x1133_7800 as the address and 0x8888_8888 as the data. The index from the address will be 100000000 two. Suppose just before the store is attempted, line 100000000 two of the cache looks like this: data for data for data for data for V tag block offset 0 block offset 1 lru V tag block offset 0 block offset 1 1 0x11337 0x0000 0x0000 1 1 0x1133a 0x0000 0x0000 _0001 _0002 _002a _0033 There will be a cache hit. Fill in the following table to show how the write hit will change the contents of line 100000000 two. data for data for data for data for V tag block offset 0 block offset 1 lru V tag block offset 0 block offset 1 Part d. (2 marks.) Suppose the D-cache is a write-through cache. In addition to updating line 100000000 two of the cache, what other action will occur within the memory system as a result of the store of part c? Part e. (2 marks.) What is the data capacity of the cache, in bytes?

ENCM 369 Winter 2009 Final Examination page 9 of 10 PROBLEM 9 (total of 11 marks) Parts a c are about a process running on a MIPS-like computer with virtual memory. The page size in this system is 4096 bytes. The TLBs for instruction and data address translations have room for only six translations each. Just before the instruction at virtual address 0x0040_28a8 is fetched, the page table for the process and the TLBs contain the following information: Page Table PPN or VPN V-bit disk info 0x00400 1 0x20600 0x00401 1 0x20202 0x00402 1 0x20777 0x00403 1 0x20600 0x10010 1 0x20543 0x10011 0 [disk info] 0x10012 1 0x20344 0x7fffe 1 0x208a2 0x7ffff 1 0x203fc Instruction TLB VPN V-bit PPN 0x00401 1 0x20202 0x00402 0 0x20313 0x00000 0 0x00000 0x00402 1 0x20777 0x00403 1 0x20654 0x00400 1 0x20600 Data TLB VPN V-bit PPN 0x7fffe 1 0x208a2 0x7ffff 0 0x20101 0x10010 1 0x20543 0x10011 0 0x20777 0x10012 1 0x20344 0x7ffff 1 0x203fc Part a. (3 marks.) Giving a reason to support your answer, state whether the instruction fetch results in a hit or a miss in the instruction TLB. What physical address will be used to fetch the instruction? Part b. (3 marks.) The instruction at virtual address 0x0040_28a8 is 0x8fa8_0008 lw $t0, 8($sp). Suppose that $sp = 0x7fff_effc. Giving a reason to support your answer, state whether the data memory access results in a hit or a miss in the data TLB. What physical address will be used to access the data? Part c. (3 marks.) Suppose the instruction following the one in parts a and b is 0x8c89_0000 lw $t1, ($a0). Suppose that $a0 = 0x1001_1804. Give as much information as you can about the physical address that will be used to read data into $t1. If you were able to completely specify the physical address, explain how you found it. If you were not able to completely specify the physical address, explain why it is not possible to do so. Parts d and e are also about virtual memory, but not specifically the system of parts a c. Part d. (2 marks.) Process 100 and process 101 are running on a system with virtual memory. The range of word addresses process 100 uses to access its stack is 0xbfff_f000 to 0xbfff_fffc. The range of word addresses used by process 101 to access its stack is also 0xbfff_f000 to 0xbfff_fffc. Explain why when process 101 writes to its stack, it is impossible that such writes will overwrite data in the stack of process 100. Part e. (2 marks.) Suppose there is a page fault in a computer with virtual memory. Explain why in some cases this will result in a disk write followed by a disk read, and in other cases the only disk activity required is a disk read.

ENCM 369 Winter 2009 Final Examination page 10 of 10 PROBLEM 10 (total of 15 marks) These are questions on miscellaneous course topics. Question a. (3 marks.) Consider a direct-mapped data cache for a computer with 64-bit data words and 44-bit physical addresses. The cache has a block size of 8 words, and the capacity of the cache is 65,536 bytes. Show how addresses would be broken into index, tag, byte offset, and block offset. Question b. (2 marks.) Suppose the computers of textbook Figures 4.17 and 4.51 (copied on pages 6 7 of the Reference Material) are implemented with similar latencies for functional units such as the memories, register file, and ALU. Explain briefly why the processor of Figure 4.51 could run reliably at a much higher clock speed than that of Figure 4.17. Question c. (2 marks.) Briefly give a definition of the term exception as it was used in ENCM 369. Question d. (2 marks.) Briefly explain why enhancing the processor of Figure 4.51 to support exception handling is a more complex design task than adding exception support for the processor of Figure 4.17. Question e. (2 marks.) Briefly state two reasons why an 80GB disk drive is a much better choice than 80GB of DRAM for file storage on a mid-priced laptop computer. Question f. (3 marks.) Suppose a is of type int* in the following code fragment, and that a, i and sum are in registers. for (i = 0; i < 65536; i++) sum += a[i]; The loop runs on a computer with 32-bit ints, 32-bit words, and one level of data cache. The D-cache has a capacity of 128KB and has 8-word blocks. If the loop runs from beginning to end without any exceptions occurring, give estimates for the minimum and maximum number of D-cache misses that occur while the loop runs. Question g. (1 mark.) Why does the assumption of no exceptions matter in question f?