Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm

University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructor for L01 and L02: Dr. S. A. Norman Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm NAME (printed): Please don t write anything within this box. 1 / 10 U of C ID NUMBER: 2 / 12 3 / 10 LECTURE SECTION (L01 was MWF at 8am, L02 was MWF at noon): 4 / 17 5 / 3 6 / 8 7 / 8 SIGNATURE: 8 / 10 9 / 9 TOTAL / 87 Instructions Please note that the official University of Calgary examination regulations are printed on page 1 of the Examination Regulations and Reference Material booklet that accompanies this examination paper. All of those regulations are in effect for this examination, except that you must write your answers on the question paper, not in the examination booklet. You may not use electronic calculators or computers during the examination. The examination is closed-book. You may not refer to books or notes during the examination, with one exception: you may refer to the Examination Regulations and Reference Material booklet that accompanies this examination paper. You are not required to add comments to assembly language code you write, but you are strongly encouraged to do so, because writing good comments will improve the probability that your code is correct and will help you to check your code after it is finished. Some problems are relatively easy and some are relatively difficult. Go after the easy marks first. Write all answers on the question paper and hand in the question paper when you are done. Please do not hand in the Examination Regulations and Reference Material booklet. Please print or write your answers legibly. What cannot be read cannot be marked. If you write anything you do not want marked, put a large X through it and write rough work beside it. You may use the backs of pages for rough work.

ENCM 369 Winter 2006 Final Examination page 2 of 9 PROBLEM 1 (10 marks). Below is the beginning of a SPIM translation of the procedure func1 in the C code listed to the right of this text. Complete the SPIM translation, using only instructions from the Final Examination Instruction Subset described in the Examination Regulations and Reference Material booklet. Follow the calling conventions used in lectures and labs, and observe the following additional conventions regarding floating-point registers: $f2, $f4,..., $f10 may be used like $t0 $t9; $f20, $f22,..., $f30 may be used like $s0 $s7..data c0pt01:.double 0.01.text.globl func1 func1: la $t0, c0pt01 l.d $f2, ($t0) void func1(double *y, double *x, int n) { int i; for (i = 0; i < n; i++) y[i] = x[i] + 0.01 * i;

ENCM 369 Winter 2006 Final Examination page 3 of 9 PROBLEM 2 (12 marks). Consider the following C program, which has been correctly translated into SPIM code in the listing on the right. Note that the reverse function makes a copy of a string, with the order of the non- \0 characters reversed. char foo[ ] = "AEIOU"; void reverse(char *dest, const char* src) { const char *p; p = src; while (*p!= \0 ) p++; while (p!= src) { p--; *dest = *p; dest++; *dest = \0 ; int main(void) { char buf[8]; buf[4] = X ; buf[5] = X ; buf[6] = X ; buf[7] = X ; reverse(buf, foo); return 0;.data.globl foo foo:.asciiz "AEIOU".text.globl reverse reverse: addu $t0, $a1, $zero L1: lb $t1, ($t0) beq $t1, $zero, L2 addiu $t0, $t0, 1 j L1 L2: beq $t0, $a1, L3 addiu $t0, $t0, -1 lb $t2, ($t0) sb $t2, ($a0) addiu $a0, $a0, 1 j L2 L3: sb $zero, ($a0) # POINT ONE jr $ra.text.globl main main: addiu $sp, $sp, -12 sw $ra, 8($sp) ori $t9, $zero, X sb $t9, 4($sp) sb $t9, 5($sp) sb $t9, 6($sp) sb $t9, 7($sp) addiu $a0, $sp, 0 la $a1, foo jal reverse # Next 2 lines were in # wrong order in original. lw $ra, 8($sp) addiu $sp, $sp, 12 jr $ra For the point in time when the assembly language program reaches point one, list the values of the registers in the table below as hexadecimal numbers. Also show the contents of the stack frame of main as hexadecimal numbers in the diagram on the right. Note that two of the words in the stack frame are used to hold bytes, and the address offsets of the bytes are indicated in the upper left corner of each byte. To solve the problem, you will need some (but maybe not all) of the following information: In ascii, A is 0x41, E is 0x45, I is 0x49, O is 0x4f, U is 0x55, and X is 0x58. The address of foo[0] is 0x1001_0000. When main starts, $ra contains 0x0040_0018 and $sp contains 0x7fff_ff40. The address of the first instruction of reverse is 0x0040_0024 and address of the first instruction of main is 0x0040_0058. SPIM is able to translate the la pseudoinstruction in main into a single machine instruction. register $ra $a0 $a1 $t0 $t1 $t2 $t9 value high addr esses data saved before main was called 0 1 2 3 0 1 2 3

ENCM 369 Winter 2006 Final Examination page 4 of 9 PROBLEM 3 (10 marks). Write a SPIM translation of the procedure func2 in the C code shown to the right of this text. Use only instructions from the Final Examination Instruction Subset described in the Examination Regulations and Reference Material booklet. Follow the calling conventions used in lectures and labs, and observe the following additional conventions regarding floating-point registers: floating-point return values of type double go in $f0; $f2, $f4,..., $f10 may be used like $t0 $t9; $f20, $f22,..., $f30 may be used like $s0 $s7; the arguments to func2 arrive in $f12, $f14, and $f16; the argument to func3 goes in $f12. double func3(double func3_arg); double func2(double left, double right, double x) { double v; v = func3(x); if (v < left) v = left; else if (right < v) v = right; return 3.14159265358979323846 * v - x; Hint: Making a diagram for the stack frame of func2 is highly recommended.

ENCM 369 Winter 2006 Final Examination page 5 of 9 PROBLEM 4 (total of 17 marks). The Exam16 ISA (instruction set architecture) describes a system in which addresses, instructions, and data words are all 16 bits wide. It has sixteen 16-bit general purpose registers, and a 16-bit PC. The instructions of Exam16 are as follows. [Correction notice: The table on the original exam had bits 15 12 wrong in sub and slt.] Mnemonic Format Description add 0000_ssss_tttt_dddd Add source registers selected by bits ssss and tttt, put result in register selected by bits dddd. sub 0001_ssss_tttt_dddd Same as add, except ALU operation is subtraction. slt 0010_ssss_tttt_dddd Same as add, except ALU operation is set-onless-than. brz 0011_ssss_oooo_oooo Branch if register is zero: If register selected by bits ssss contains zero, branch forward or backward by number of instructions in 8-bit 2 scomplement offset oooo_oooo. lw 0100_0000_aaaa_dddd Using register selected by bits aaaa as an address, load word from data memory into register selected by bits dddd. sw 0101_ssss_aaaa_0000 Using register selected by bits aaaa as an address, store word from register selected by bits ssss into data memory. Note that unlike MIPS, there are no offsets built into Exam16 load and store instructions. Below is a nearly-complete datapath for a computer that implements the Exam16 ISA. It is very much in the style of the single-cycle MIPS subset implementation studied in ENCM 369. Note that there are two 16-bit adders and a 16-bit ALU. The ALUOp signal works as follows: 00 asks for addition, 01 for subtraction, and 10 for set-on-less-than. The circuit labeled All Bits 0? is a big nor gate the 1-bit output is 1 if all 16 input bits are 0, and is 0 otherwise. 2 Adder #1 Adder Instruction[7 0] Sign Extend 16 Shift Left 1 #2 16 All Bits 0? PC clock Address Instruction [15 0] Instruction Memory Instruction[11 8] Instruction[7 4] Instruction[3 0] Reg #1 Reg #2 Write Reg # Write Data RegWrite Data #1 Data#2 Registers clock Mem Address clock ALU Data Memory ALUOp Data 16 MemWrite Instruction[15 12] to Control Unit Write Data Part a (2 marks). The Address and Write Data inputs to the Data Memory are not connected to anything. What signals should be sent to these inputs? Explain why. Part b (3 marks). The Write Data input to the Register File is not connected to anything. How should this signal be driven? Here is a hint: Introduce a new control signal, give it a name, and use it to control a multiplexer. Briefly give reasons to support your design.

ENCM 369 Winter 2006 Final Examination page 6 of 9 PROBLEM 4 (continued from previous page). Part c (3 marks). The input to the PC register is not connected to anything. How should this signal be driven? A new control signal, a new multiplexer, and perhaps some other new, simple logic element will be needed. Briefly give reasons to support your design. Part d (6 marks). Fill in the table of control signal values to the right of this text. The last two columns are reserved for the new control signals you introduced in parts b and c please write in the names of these signals. Use X in table cells to indicate that a particular control signal is a don t care for a particular instruction. Instruction add sub slt brz Mem MemWrite RegWrite ALUOp lw sw Part e (3 marks). Suppose you want to extend the Exam16 ISA to include an addc ( add constant ) instruction with the following format: 1000_ssss_cccc_dddd ssss encodes the source register, dddd encodes the destination register, and cccc encodes a constant in the range from 0 to 15. Describe all the datapath changes (not control changes) that would be needed to add support for addc while continuing to support the original six Exam16 instructions. PROBLEM 5 (3 marks). In the five-stage-pipelined implementation of the MIPS instruction subset, the 3rd stage is execution in the ALU, the 4th state is memory access, and the 5th stage is writeback to the register file. Consider the following sequence of MIPS instructions: add $t0, $t1, $t2 lw $t3, 12($t0) sub $t4, $t4, $t3 Explain how forwarding can be used to start the lw one clock cycle later than the add, and why forwarding can not be used to start the sub one clock cycle later than the lw.

ENCM 369 Winter 2006 Final Examination page 7 of 9 PROBLEM 6 (8 marks). The multicycle implementation for the MIPS subset is shown in Figure 5.28 on page 6 of your Reference Material booklet. Support for the addi instruction can be added without changing the datapath all that is needed is two new states in the finite state machine used for the main control unit. The new states would occur after state 1. Fill in the table to the right to show the values of control signals needed for the four steps of addi. Use X in table cells to indicate that a particular control signal is a don t care for a particular state. signal IorD Mem MemWrite IRWrite ALUOp ALUSrcA ALUSrcB RegDst MemtoReg RegWrite PCSource PCWrite PCWriteCond state 0 state 1 1st new state 2nd new state Here are some reminders and hints: The format for addi is 001000_sssss_ddddd_cccc_cccc_cccc_cccc, where sssss selects the source, ddddd selects the destination, and bits 15 0 supply the constant to be used in the addition. In state 0, the instruction is fetched and the PC+4 computation is done in the ALU. In state 1, which like state 0 is common to all instructions, a branch target is computed. In the 2nd new state, the register file will be updated. If ALUOp is 00, the ALU will do an addition. PROBLEM 7 (total of 8 marks). Part a (4 marks). How would the number 0.078125 be represented as a IEEE 754 doubleprecision number. (Note that the IEEE 754 formats are summarized on page 2 of your Reference Material booklet.) Here are some hints: first, 0.078125 = (1/16 + 1/64), and second, 1023 is 0x3ff. [Correction notice: The original exam incorrectly said that 1023 is 0x7ff.] Show your work, and use base sixteen to represent your final answer. Part b (4 marks). Suppose that $f2 contains 0x7f00_0000 and $f3 contains 0xc080_0000 before the instruction mul.s $f0, $f2, $f3 is run. What bit pattern will the instruction write to $f0? Show your work, and use base sixteen to represent your final answer. Here is a hint: The bit patterns in $f2 and $f3 have been chosen so that any arithmetic you might have to do to solve this problem will be relatively easy.

ENCM 369 Winter 2006 Final Examination page 8 of 9 PROBLEM 8 (total of 10 marks). In parts a, b, and c, show your work so you can get partial credit if you make a mistake. Also, note that the Reference Material booklet has a table of powers of two. Part a (3 marks). A cache for a computer system with 32-bit words and 32-bit addresses is direct-mapped, has one-word blocks, and has a capacity of of 8192 words. Which bits of an address would be used for byte offset, which bits for index, and which bits for tag? Part b (3 marks). A cache for a computer system with 64-bit words and 48-bit addresses is 4-way set-associative, has eight-word blocks, and has a capacity of 65,536 bytes. Which bits of an address would be used for byte offset, which bits for block offset, which bits for index, and which bits for tag? Part c (2 marks). What would be the total number of bits of tag stored in the cache of part b? You may express your answer as a product of integers, something like 29 384, or 7 32 256. (Neither of those examples is a correct answer for this problem!) Part d (2 marks). A current processor design used in laptop computers has a 32-kilobyte Level 1 (L1) I-cache, a 32-kilobyte L1 D-cache, and a 512-kilobyte unified Level 2 (L2) cache. The L1 caches contain information that is also in the L2 caches. Why would it be a bad idea to simplify the design by getting rid of the L2 cache and using the space for bigger L1 caches instead 256 kilobytes for the L1 I-cache, and the same for the L1 D-cache?

ENCM 369 Winter 2006 Final Examination page 9 of 9 PROBLEM 9 (total of 9 marks). Part a (2 marks). Consider a computer that runs the MIPS instruction set. The computer supports virtual memory with a page size of 65,536 bytes. Suppose that a process has the instruction lw $s2, 16($a0) located at virtual address 0x0041_fff8, and suppose that just before the instruction is fetched, $a0 = 0x1002_fffc. When the instruction is fetched, the instruction TLB contains the following translations: virtual page number valid bit physical page number 0x0041 0 0x9900 0x0040 1 0x9711 0x0041 1 0x9822 0x0042 1 0x9633 Will there be a hit or a miss in the instruction TLB? If there is a miss, explain why. If there is a hit, what will be the physical address used to fetch the instruction? Part b (2 marks). Continuing from the instruction fetch in part a, just before the data memory access step, the data TLB contains the following translations: virtual page number valid bit physical page number 0x7fff 1 0x9255 0x1001 1 0x9166 0x1002 0 0x9388 0x1003 1 0x9477 Will there be a hit or a miss in the data TLB? If there is a miss, explain why. If there is a hit, what will be the physical address used for data memory access? Part c (2 marks). Briefly describe what information would be in a page table, and what that information would be used for. Part d (3 marks). Data TLBs are often designed so that each entry in a TLB will have a virtual page number, a physical page number, a valid bit, and another bit called a dirty bit (and maybe a few other special-purpose bits of memory). The dirty bit is set to 0 whenever the entry is updated by the operating system kernel, and is changed to 1 the first time the process using the associated page writes to that page. Suppose the kernel needs to copy a page of data from disk to physical memory. Describe a situation in which the kernel could save a significant amount of time by discovering that a dirty bit in the TLB is 0.