Last Name (in case pages get detached): UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING FINAL EXAMINATION, APRIL PDF Free Download

Page 1 of 17 UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING FINAL EXAMINATION, APRIL 2011 ECE243H1 S COMPUTER ORGANIZATION Exam Type: D Duration: 2.5 Hours Prof.s Anderson, Enright Jerger, and Steffan This is a type D exam. You are allowed to use any book/notes and a non-programmable calculator as allowed by the University regulations. Last Name (Print): First Name: Student Number: Marks 1 15 2 20 3 15 4 20 5 20 6 20 7 30 Total 140 Max. Marks Please: State your assumptions. Show your work. Comment your code. Use your time wisely. The mark value of each question is roughly equivalent to how many minutes it should take to answer. If you think that assumptions must be made to answer a question, state them clearly. If there are multiple possibilities, comment that there are, explain why and then provide at least one possible answer and state the corresponding assumptions.

Page 2 of 17 Part 1. [15] Short Answer a) Give the immediate 16bit value that is needed to encode the beq r8, r9, branch2 instruction in the following NIOS code: branch1: branch2: beq r8, r9, branch2 blt r8, r9, branch1 add r8, r8, r8 add r8, r8, r8 add r9, r9, r9 add r8, r8, r0 Answer: 16 (0x0010) b) Consider the following simplified datapath (as discussed in class) with no forwarding lines: Write STALL between instructions that contain a data hazard so that the code produces the correct result in the above datapath: add r8, r8, r9 STALL sub r9, r8, r10 add r10, r10, r10 STALL ldw r11, 0(r10) c) True or False: there are instructions found in CISC machines that are also found in RISC machines [true]

Page 3 of 17 d) The following code sums the top two words on the stack, pops one of the words, and overwrites the other word with the sum- - - however, the code does not do these operations in that order. In general the code works, but in what case would this code lead to an incorrect result? (NOTE: This is a hard question!) addi sp,sp,4 #move the stack pointer by 4bytes ldw r8,-4(sp) #get first operand ldw r9,0(sp) #get second operand add r8,r9,r8 #calculate the sum stw r8,0(sp) #store the sum [moving the stack pointer before loading from it means an interrupt between the addi and first ldw could change the value on the stack at - 4(sp)] e) Fill in the table below, showing the result of each instruction. Assume that : R8 = 0xBADACABA slli r9,r8,4 srli r9,r8,8 srai r9,r8,8 R9=0xADACABA0 R9=0x00BADACA R9=0xFFBADACA f) Caches are organized into blocks to exploit what form of locality? Why does it make sense for an instruction- cache to have blocks (instead of single- instruction cache entries)? Blocks exploit spatial locality; instructions are often executed sequentially. g) In virtual memory implementations, why are pages normally so much larger (e.g., 4KB) than cache blocks (eg., 64B)?

Page 4 of 17 Hard drives are much slower than main memory, so you must transfer larger chunks to be worth the overhead. Also to reduce the page table size. Part 2. [20] C to Assembly The structure used to represent a binary tree are shown in part (a) of the figure below. Part (b) shows the sum_tree function, a C function that accepts a pointer, tree. The tree pointer points to an element in the tree. The function traverses the tree recursively to sum all the elements. Ints and pointers are 4B each. Implement sum_tree in NIOS II assembly. Please use callee- save registers for any temporary registers you may need (and save/restore them). struct node_t { int item; struct node_t *left; struct node_t *right; } (a) void sum_tree (struct node_t *tree) { if (tree!= NULL) { sum += tree- >item; sum_tree(tree- >left); sum_tree(tree- >right); } } (b) COPY1: please cross out the copy you do not want graded.section.data.align 2 sum:.word 0 sum_tree: # prologue.section.text add sp, sp, - 20 stw ra, 16(sp) stw r16, 12(sp) stw r17, 8(sp) stw r18, 4(sp) movia r16, sum # if (tree!= NULL) { beq r4, r0, epi # sum += tree- >item; ldw r17, 0(r4) ldw r18, 0(r16) epi: # sum_tree(tree- >left); stw r4, 0(sp) ldw r4, 4(r4) call sum_tree ldw r4, 0(sp) # sum_tree(tree- >right); ldw r4, 8(r4) call sum_tree # epilogue ldw r18, 4(sp) ldw r17, 8(sp) ldw r16, 12(sp) ldw ra, 16(sp) add sp, sp, 20 ret

Page 5 of 17 add r18, r18, r17 stw r18, 0(r16) (this is repeated here for your convenience) struct node_t { int item; struct node_t *left; struct node_t *right; } (a) void sum_tree (struct node_t *tree) { if (tree!= NULL) { sum += tree- >item; sum_tree(tree- >left); sum_tree(tree- >right); } } (b) COPY2: please cross out the copy you do not want graded.section.data.align 2 sum:.word 0 sum_tree: # prologue.section.text # sum_tree(tree- >left); # sum_tree(tree- >right); # epilogue # if (tree!= NULL) { # sum += tree- >item;

Page 6 of 17 Part 3. [15] Stack / Procedure calls a) Fill in the stack operations to save registers as required by convention in the following subroutine: RS232Out: # put stack operations here addi sp,sp,-8 stw r19,0(sp) stw r21,4(sp) movia r13, 0x10001010 /* r13 now contains the base addr of UART*/ ldhio r19, 4(r13) /* Load from the UART */ beq r19, r0, Done /* this branch is taken if no room for data*/ ldbio r8, 0(r4) /* get value to output */ stwio r21, 0(r13) Done: # put stack operations here ldw r21,4(sp) ldw r19,0(sp) addi sp,sp,8 ret b) If this subroutine were to be called from code within an interrupt handler, what registers (in addition to those you saved above) would also have to be saved (if any)? Where in the code would you perform this additional saving of registers? ra, r4, r8, r13 saved in preamble of ISR

Page 7 of 17 Part 4. [20] Interrupts The program and data below have been loaded on a NIOS II system. Assume that as soon as interrupts are enabled, that it is possible that some device might request an interrupt. A 5-element array is stored in memory at label array..section.data index:.byte 4 array:.byte 1, 2, 3, 4, 5.section.exceptions, ax h0: movia r8, index h1: stbu r0, 0(r8) h2: addi ea, ea, - 4 h3: eret.section.text main: # interrupts are disabled here movia r16, index movia r17, array movia r18, sum # interrupts are enabled here f0: ldbu r19, 0(r16) f1: add r20, r17, r19 f2: ldbu r21, 0(r20) f3: addi r21, r21, 1 f4: addi r19, r19, - 1 f5: stbu r19, 0(r16) f6: stbu r21, 0(r20) f7: bgt r19,r0,f0 f8: Give the final contents of array for each of the following scenarios: a) No interrupt occurs: array = { 2, 3, 4, 5, 6 } b) One interrupt occurs before f0: c) below, put check marks next to each valid final state for array (for any occurrence of interrupts including (a) and (b) above): array = {2, 3, 4, 5, 6} _X array = {1, 2, 3, 4, 5} array = {2, 3, 4, 5, 5} array = {2, 2, 3, 4, 5} _X array = {1, 2, 3, 4, 6} array = {2, 2, 3, 4, 6} _X array = {2, 2, 3, 5, 6}_X array = {2, 2, 2, 4, 6} array = {2, 2, 4, 5, 6}_X array = {1, 2, 3, 5, 6} array = {1, 2, 4, 5, 6}

Page 8 of 17 array = { 2, 2, 3, 4, 5 } Part 5. [20] CPU Design You have the following computer structure as described in class: The datapath is controlled by the following control signals: PCout PC to bus PCwrite MDRBuswrite MDR updated from bus MEMoutBus Memory data value to bus MARwrite Ywrite Zwrite ZoutBus Z value to bus IRwrite RFsel Select which register is output to bus or written, as determined by RFout and RFwrite RFout Allow output from selected register to bus RFwrite Write to selected register from bus ra_outbus ra register value to bus ra_write write to ra register from bus ALUop Add, subtract, shift, etc. Select Chooses one of the ALU inputs MEMRead The data from the memory address specified is output to Dout MEMWrite The data on Din is written to the memory address specified

Page 9 of 17 a) Show a cycle- by- cycle timing diagram of how the memory works for a single read. Assume that control signals change only on the falling edge of the clock. Also assume that MDR data is available on the rising edge of the clock directly after the cycle in which the MEMRead is initiated. NOTE: this is a very easy question, we essentially describe the answer in the question- - - its purpose is to help you get the answer to (b) correct. CPU Clock MARwrite MEMRead MDR data available for bus b) Fill in the table below to specify the proper control signals to implement a JSR (jump to subroutine) instruction that performs: ra pc+2, pc mem[pc+1]. Include all cycles, including those to fetch the instruction. State all assumptions. For ALUop and Select, just specify the values. Ex: Select Y, ALUop=Subtract. Assume that memory read information is available the cycle directly after the cycle in which the MEMRead is initiated. The first cycle is started for you (but not necessarily completed). Put an X only where signals are active during the cycle. Some columns and rows may be unused. For full marks your implementation should be as fast as possible.

Page 10 of 17 COPY1: please cross out the copy you do not want graded Cycle Signal 1 2 3 4 5 6 7 8 9 PCout X xxxxxx xxxxxx xxxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx PCwrite X X MDRBuswrite MEMoutBus X X MARwrite X X Ywrite Zwrite X X ZoutBus X X IRwrite X RFsel RFout RFwrite ra_outbus ra_write X ALUop ADD ADD Select 1 1 MEMRead X X MEMWrite

Page 11 of 17 COPY2: please cross out the copy you do not want graded Cycle Signal 1 2 3 4 5 6 7 8 9 PCout X PCwrite MDRBuswrite MEMoutBus MARwrite Ywrite Zwrite ZoutBus IRwrite RFsel RFout RFwrite ra_outbus ra_write ALUop Select 1 MEMRead MEMWrite

Page 12 of 17 Part 6. [20] Memory Design Consider the ROM chip above that has the following pins of interest: A5..0: address pins D15..0: data pins, which include built-in tri-state buffers enabled by BE1 and BE0 appropriately BE1: byte enable for the upper byte of output (D15..8) BE0: byte enable for the lower byte of output (D7..0) You are to build a ROM device out of a number of these ROM chips by completing one copy of the diagrams on the next two pages. Your ROM device and solution should have the following features: 256bytes of ROM storage The first byte (eg., byte0) of the ROM should be mapped to address 0xFA000 All 256bytes of the ROM should be mapped to consecutive byte addresses o Eg., byte1 is mapped to 0xFA001, byte2 is mapped to 0xFA002, etc. Your ROM device should respond properly to byte, half-word, and word loads You may use only decoders, AND gates, OR gates, NOT gates, and tri-state-buffers You do not have to handle illegal combinations of the BE3..0 signals You can ignore the ACK line of the bus (it is not shown) COPY1: please cross out the copy you do not want graded.

Page 13 of 17 COPY2: please cross out the copy you do not want graded.

Page 14 of 17 Part 7. [30] Caches

Page 15 of 17 a) fill in the blanks in the table below; each row describes a certain cache design by giving the total cache capacity, the cache block size (data only), associativity, and number of tag, set-index, and offset bits. For all rows assume a 32-bit address space. Show your work on this page below for possible part marks. Total Capacity Block Size Associativity Tag bits Set-index bits Cache1: 1MB 128B 2-way 13 12 7 Cache2: 2KB 32B 1-way (directmapped) 21 6 5 Cache3: 8KB 64B 4-way 21 5 6 Offset bits b) assuming a 512B, 2- way set- associative cache with 16B cache blocks and LRU replacement, and the address- trace from a set of accesses below:

Page 16 of 17 What is the number total number of hits: 8 What is the hit rate: 50% Show your work below for possible part marks. ADDRESS TRACE: 0xFA23 0xFA29 0xFADE 0xEBD0 0xFB2E 0xFAD3 0xFA2A 0xFAD9 0xFBDE 0xEB23 0xFA21 0xFA20 0xEBDF 0xEB29 0xFBDC 0xFAD1 c) For a cache, is there a sequence of addresses that will perform better if the cache is directmapped rather than 2-way set associative? If so, describe the sequence. This is assuming that the capacity of the cache and the size of the blocks is the same in both cases.

Page 17 of 17 (CEDOMIR S ANSWER) Cache has four 1-byte blocks (Blocks 0-3, Sets 0-1) and LRU policy. The sequence of addresses is 0, 2, 6, 0: Direct 0 MISS, BRING 0 to Block 0 2 MISS, BRING 2 to Block 2 6 MISS, BRING 6 to Block 2 0 HIT 2-way 0 MISS, BRING 0 to Set 0 2 MISS, BRING 2 to Set 0 6 MISS, 6 REPLACES 0 in Set 0 0 MISS

Last Name (in case pages get detached): UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING FINAL EXAMINATION, APRIL 2011