Full Name: CSCI 540, Fall 2014 Practice Final Exam Instructions: Make sure that your exam is not missing any sheets, then write your full name on the front. Put your name or student ID on each page. Write your answers in the space provided below the problem. If you make a mess, clearly indicate your final answer. This exam is OPEN BOOK and you can use a single page of notes. You can not use a computer. Good luck! Problem Page Possible Score 1 1 20 2 2 20 3 4 20 4 5 20 Total 80
CSCI-540 - Fall 2014-1- December 7, 2014 1. [ 20 Points ] The following problem concerns basic cache lookups. The memory is byte addressable. Memory accesses are to 1-byte words (not 4-byte words). Physical addresses are 12 bits wide. The cache is 4-way set associative, with a 4-byte block size and 64 total bytes. In the following tables, all numbers are given in hexadecimal. The left-most value is byte #0 and the right-most byte is byte #3. The contents of the cache are as follows: 4-way Set Associative Cache Set (V?, Tag, Data) (V? Tag, Data) (V? Tag, Data) (V? Tag, Data) 0 (1, 00E, 6ECF3F9D) (1, 003, 4584EBAF) (1, 023, 436831E8) (1, 0F6, 74DC71BD) 1 (1, 0E9, 5FC155A5) (1, 00C, 16DE30ED) (1, 00B, 22876351) (1, 003, 12ECA0AA) 2 (1, 004, 27189C37) (1, 0FA, 0F01197D) (1, 006, 189A3395) (1, 005, 55274324) 3 (1, 00C, 61DC7BB3) (1, 003, 74758DBD) (1, 006, 1111C0D6) (1, 00D, 4FCCBF9D) [ 5 Points ] Label the parts of the address used as the block offset (BO) within the line, the cache set index (CI) and the cache tag (CT). Our computer access specific memory locations given the cache state above. For each given physical address, indicate the Byte Offset (BO), the Cache Set Index (CI) and the Cache Tag (CT). Then, give the value (V) that would be loaded if the load is a hit; if the address is a cache miss, write miss. Use hexidecimal values throughout. (a) [ 5 Points ] Physical address: 233 (b) [ 5 Points ] Physical address: 78D (c) [ 5 Points ] Physical address: FA8 CSCI-540 - Fall 2014-1- December 7, 2014
CSCI-540 - Fall 2014-2- December 7, 2014 2. [ 20 Points ] You ve been hired to optimize the a gaussian blur filter on the world s tiniest images, each of which are only 8 8. You start with the code: char m[3][8][8] =...; for (int j = 1; j < cols-1; j++) { for (int i = 1; i < rows-1; i++) { for (int p = 0; p < 3; p++) { char up = m[p][i-1][j]; char down = m[p][i+1][j]; char left = m[p][i][j-1]; char right = m[p][i][j+1];... = (m[p][i][j] + left + right + up + down)/5; You should assume: Char takes 1 bytes; you should ignore the store / memory writes only consider loads ; The array m starts at address 0; memory addresses are 12 bits long; All scalars are held in registers. Your cache is 2-way set associate with 4 byte lines, and a total size of 32 bytes and a least recently used replacement policy. Below, list the address for the first 12 READ or LOAD references (ignore the Store/Write) and indicate if it is a hit or miss in the cache. Use decimal numbers throughout. It s easy to to first write down the array entry (e.g. m[1][2[3]), translate that to an address and then figure out the hit or miss. [ 12 Points ] Ref # Address Array Entry Hit? 0 1 2 3 4 5 6 7 8 9 10 11 CSCI-540 - Fall 2014-2- December 7, 2014
CSCI-540 - Fall 2014-3- December 7, 2014 [ 4 Points ] Below, draw a diagram to show the state of the cache at the end of the references above. You should clearly distinguish each set. For each cache line, you should indicate if the entry is valid and the appropriate starting memory address for that line/block (if valid). If the entry is not valid, just leave the data blank and/or have the tag be zero (we re ignoring the valid bit in this example). Rather than showing the tag bits, which are harder to compute, indicate the starting memory address of the cache block, which should be evenly divisible by the block size. All numbers must be decimal. [ 4 Points ] Assume the two loops were switched into this order: char m[3][8][8] =...; for (int p = 0; p < 3; p++) { for (int i = 1; i < rows-1; i++) { for (int j = 1; j < cols-1; j++) {... How many cache misses would occur when that code is executed, assuming that s the only code that executed and the cache was initially empty. CSCI-540 - Fall 2014-3- December 7, 2014
CSCI-540 - Fall 2014-4- December 7, 2014 3. [ 20 Points ] The following problem concerns optimizing a procedure for maximum performance on an Intel Pentium III with the following characteristics of the functional units: Operation Latency Issue Time/Rate Integer Add 1 1 Integer Multiply 4 1 Integer Divide 36 36 Floating Point Add 3 1 Floating Point Multiply 5 2 Floating Point Divide 38 38 Load or Store (Cache Hit) 1 1 You ve just joined a programming team that is trying to develop the world s fastest factorial routine. Starting with recursive factorial, they ve converted the code to use iteration: int fact(int n) { int i; int result = 1; for (i = n; i > 0; i--) result = result * i; return result; By doing so, they have reduced the number of cycles per element (CPE) for the function from around 63 to around 4 (really!). Still, they would like to do better. One of the programmers heard about loop unrolling. He generated the following code: int fact_u2(int n) { int i; int result = 1; for (i = n; i > 0; i-=2) { result = (result * i) * (i-1); return result; Unfortunately, the team has discovered that this code returns 0 for some value(s) of argument n. CSCI-540 - Fall 2014-4- December 7, 2014
CSCI-540 - Fall 2014-5- December 7, 2014 (a) [ 5 Points ] For what values of n will fact_u2 and fact return different values? (b) [ 5 Points ] Show the simple fix for fact_u2 that makes its behavior identical to fact. (c) [ 5 Points ] Benchmarking fact_u2 shows no improvement in performance. How would you explain that? You might want to sketch out the assembly for that loop. (d) [ 5 Points ] You modify the line inside the loop to read: result = result * (i * (i-1)); To everyone s astonishment, the measured performance now has a CPE of 2.5. How do you explain this performance improvement? You might want to characterize how the assembly language for this version would differ from the former. CSCI-540 - Fall 2014-5- December 7, 2014
CSCI-540 - Fall 2014-6- December 7, 2014 4. [ 20 Points ] The following problem concerns the way virtual addresses are translated into physical addresses. The memory is byte addressable. Memory accesses are to 1-byte words (not 4-byte words). Virtual addresses are 10 bits wide. Physical addresses are 14 bits wide. The page size is 64 bytes. The TLB is 2-way set associative with 8 total entries. The L1 Cache is direct mapped, with a 4-byte block size and 64 total bytes. In the following tables, all numbers are given in hexadecimal and the left-most value is byte #0 and the right-most byte is byte #3, where applicable.. The contents of the TLB, a portion of the page tables, and the 16 entries of the Cache are as follows: TLB Index Tag Valid 0 1 0d 1-0 1 1 18 1 3 0c 1 2 2 16 1-0 3 2 3e 1-0 Page Table VPN Present 000 019 1 001 001 1 002 03f 1 003 020 1 004 00d 1 005 018 1 007 001 1 008 015 1 009 000 1 00a 016 1 00b 03e 1 00c 06f 1 00d 00c 1 00f 039 1 Cache Index Valid Tag Data 0 1 3b BF3A02F3 1 1 4a E8E7BA4F 2 1 12 23033CCA 3 1 16 7AFB27EE 4 1 2f 8F9F64E8 5 1 3e EA13BEFD 6 1 2b FEA8AAA6 7 1 4c BD501308 8 1 0d 4D011D8E 9 0 1b 7EFEB6ED 10 1 2b 860DFCB3 11 1 15 9D769441 12 1 3a 62DA7A7D 13 1 5b C8D747DD 14 1 6f CA8DC445 15 1 7a 90FAAF41 CSCI-540 - Fall 2014-6- December 7, 2014
CSCI-540 - Fall 2014-7- December 7, 2014 (a) [ 3 Points ] The box below shows the format of a virtual address. Indicate (by labeling the diagram) the fields (if they exist) that would be used to determine the following: (If a field doesn t exist, don t draw it on the diagram.) VPO The virtual page offset TLBI The TLB index VPN The virtual page number TLBT The TLB tag (b) [ 2 Points ] The box below shows the format of a physical address. Indicate (by labeling the diagram) the fields that would be used to determine the following: PPO ( The physical page offset) and ( The physical page number). (c) [ 25 Points ] (5 points each) For the given virtual addresses, indicate the TLB entry accessed and the physical address. Indicate whether the TLB misses and whether the entry is or is not in the page table. If the physical page number and address can not be determined, write N/A. Then if a physical address exists indicate the cache translation parts, if its a cache hit, and a value if applicable. If any part can t be determined just write N/A. i. Virtual address: 2d5 A. Virtual address format (one bit per box) B. Address translation VPN TLB Index TLB Tag TLB Hit? (Y/N) In Page Table? (Y/N) C. Physical address format (one bit per box) CSCI-540 - Fall 2014-7- December 7, 2014
CSCI-540 - Fall 2014-8- December 7, 2014 D. Cache Translation Block Offset Cache Index Cache Tag Cache Hit? (Y/N) ii. Virtual address: 1b1 A. Virtual address format (one bit per box) B. Address translation VPN TLB Index TLB Tag TLB Hit? (Y/N) In Page Table? (Y/N) C. Physical address format (one bit per box) D. Cache Translation Block Offset Cache Index Cache Tag Cache Hit? (Y/N) CSCI-540 - Fall 2014-8- December 7, 2014
CSCI-540 - Fall 2014-9- December 7, 2014 iii. Virtual address: 33b A. Virtual address format (one bit per box) B. Address translation VPN TLB Index TLB Tag TLB Hit? (Y/N) In Page Table? (Y/N) C. Physical address format (one bit per box) D. Cache Translation Block Offset Cache Index Cache Tag Cache Hit? (Y/N) iv. Virtual address: 112 A. Virtual address format (one bit per box) B. Address translation VPN TLB Index TLB Tag TLB Hit? (Y/N) In Page Table? (Y/N) C. Physical address format (one bit per box) CSCI-540 - Fall 2014-9- December 7, 2014
CSCI-540 - Fall 2014-10- December 7, 2014 D. Cache Translation Block Offset Cache Index Cache Tag Cache Hit? (Y/N) v. Virtual address: 22f A. Virtual address format (one bit per box) B. Address translation VPN TLB Index TLB Tag TLB Hit? (Y/N) In Page Table? (Y/N) C. Physical address format (one bit per box) D. Cache Translation Block Offset Cache Index Cache Tag Cache Hit? (Y/N) CSCI-540 - Fall 2014-10- December 7, 2014