ECE 485/585 Midterm Exam Time allowed: 100 minutes Total Points: 65 Points Scored: Name: Problem No. 1 (12 points) For each of the following statements, indicate whether the statement is TRUE or FALSE: (a) The following is an example of an instruction used in memory-mapped I/O: IN AX, 4 FALSE (b) In an architecture that restricts memory operand alignment, a double-word write starting at the following hexadecimal address will result in an un-aligned access: 0x4273fb6a TRUE (c) A breakpoint inserted in the code by a debugger will result in a synchronous interrupt TRUE (d) If the memory access pattern exhibits high spatial locality, it is better to use high order address interleaving FALSE (e) In a burst EDO DRAM, two different column addresses must be specified for accesses to two consecutive columns FALSE (f) Increasing the associativity of a cache has no impact on compulsory misses TRUE Problem No. 2 (9 points) For each of the following questions, encircle ALL the correct answers: (a) When assigning interrupt priorities among multiple interrupt requests, the following factors need to be considered: i. Relative importance of the I/O device that generated the request ii. Length of the Interrupt Service Routine iii. Ability of I/O device to buffer data iv. All of the above (b) A DDR3-1333 DRAM has the following timing parameters: t RCD = 6 cycles, t RP = 6 cycles, t RAS = 14 cycles, t RRD = 3 cycles. What is the minimum time between activating two rows in two different banks? i. 9 ns ii. 30 ns iii. 4.5 ns iv. None of the above (c) When comparing SRAM with DRAM, which of the following statements are correct? i. SRAM has lower density compared to DRAM ii. SRAM is easier to integrate with logic circuits as compared to DRAM iii. SRAM requires multiplexed address lines whereas DRAM does not iv. All of the above
Problem No. 3 (10 points) (a) (6 points) In this problem, your objective is to design a finite state machine that recognizes the particular pattern 10110. The input to the finite state machine is a sequence of binary bits in series. When the FSM sees the pattern 10110 in its most recent input bits, it should output 1, otherwise it should output 0. Draw the state transition diagram for this FSM? (b) (2 points) State ONE advantage of using DMA as compared to using programmed I/O. DMA frees up the CPU from having to co-ordinate every single transfer of bytes between an I/O device and memory. This allows the CPU to carry out other tasks while a data transfer is going on. (c) (2 points) What is the usage of Minimum/Maximum mode in an 8086 CPU? To support a co-processor
Problem No. 4 (14 points) (a) (8 points) The following table shows the cache configuration for three different caches (C1, C2 and C3) in terms of cache size, line size and associativity. For each of the caches, fill in the missing entries in the table: (i) Number of sets, (ii) Number of address bits needed for the Index field, (iii) Number of address bits needed for the Tag field. Assume that the processor is using 32-bit addresses: CACHE CACHE LINE SIZE ASSOCIATIVITY NUMBER INDEX BITS TAG BITS SIZE OF SETS C1 64 KB 64 B Direct mapped 1024 10 16 (1-way) C2 256 KB 64 B 8-way Set 512 9 17 Associative C3 16 KB 32 B Fully Associative 1 0 27 (b) (3 points) What is the minimum burst length supported in DDR2? Why do DDRx memories not support a burst length of 1? DDR2 supports a minimum burst length of 4. DDRx memories carry out two data transfers per clock cycle. They accomplish that by doing a 2n or greater prefetch (where n is the width of the data bus). Since at least 2n data have already been prefetched, using a burst length of 1 would simply waste data bus bandwidth. (c) (3 points) Describe a scenario in which a cache write request sent by the processor results in a memory write followed by a memory read. Consider a cache which use write-allocate and write-back policies. Consider a write to address A which results in a cache miss. The cache decides decides to evict block B to make room for A. Assume that B had its dirty bit = 1. Therefore, evicting B will result in a memory write. After B has been evicted A will be fetched from memory. This will result in a memory read.
Prob vlem No. 5 (20 points) A processor uses a dual-rank DDR4-2400 memory system. The following table shows the relevant memory system parameters. Assume that the memory controller is using an open page policy, such that once a row in a bank has been activated, it is kept open as long as there is no conflicting request to a different row in the same bank. DRAM Parameter Value Number of ranks 2 DRAM channel width 64 bits DRAM chip output width 4 bits DRAM chip capacity 16 Gbits Number of banks 16 Row size 4KBytes Burst length 16 Memory controller policy Open page t RCD t CL t RP Answer the following questions: (a) (3 points) Calculate the total DRAM capacity available in the system. # of ranks = 2 DRAM chip capacity = 16 Gbits # of DRAM chips per rank = DRAM channel width / DRAM chip output width = 64 / 4 = 16 Capacity of each DRAM rank = 16 Gbits * 16 = 256 Gbit = 32 Gbytes Total DRAM capacity = Capacity of each rank * # of ranks = 32 Gbytes * 2 = 64 Gbytes (b) (6 points) Calculate the number of bits needed to specify each of the following fields in the physical address: (i) Rank, (ii) Bank, (iii) Column, and (iv) Row. Number of bits needed to specify the desired rank = log 2(# of ranks) = log 2(2) = 1 Number of bits needed to specify the desired bank = log 2(# of banks) = log 2(16) = 4 # of DRAM rows per bank = Capacity of each bank / Capacity of each row = (Rank capacity / # of banks per rank) / Row capacity = (32Gbytes/16)/4Kbytes = 2Gbytes/4Kbytes = 2 31 / 2 12 = 2 19 Therefore, Number of bits needed to specify the desired row = log 2(2 19 ) = 19 Width of each column = Channel width = 64 bits = 8 bytes = 2 3 bytes Number of columns per row = Row Capacity / Column width = 4KB / 8B = 2 12 / 2 3 = 2 9 = 512 Therefore, number of bits needed to specify the column field = log 2(512) = 9
(c) (6 points) Consider a memory access sequence which requires the processor to read the ENTIRE contents of a single DRAM row R1 in the bank B1. Before this access sequence could proceed, the currently open row in bank B1 (a row different from R1 ) needs to be closed. In the absence of any other memory requests, how long (in nanoseconds) will it take for the memory controller to complete the access sequence to row R1? Clock speed for DDR4-2400 memory = 2400 / 2 = 1200 MHz Therefore 1 DRAM clock cycle = 1/1200MHz = 0.833 nsec The access sequence specified in the problem statement requires the following steps: (i) Previous row is closed (takes t RP) (ii) Next row is activated (takes t RCD) (iii) A CAS is sent to slect the first column in the row (takes t CL) (iv) The ENTIRE row is transferred to the processor. This requires 4KB / 8B = 512 transfers, or 512/2 = 256 clock cycles. Therefore, total time taken to read the ENTIRE contents of the row = 0.833 * (10+10+10+256) = 238.2 nanoseconds (d) (5 points) Assume that each DRAM row must be refreshed once in every 64 milliseconds. For that purpose, refresh commands are being periodically sent to each DRAM rank. Each refresh command triggers a parallel refresh operation in every bank within a rank, resulting in 16 rows to be refreshed in each bank. Assume that each refresh command takes 400ns (t RFC). Calculate the fraction of time for which the memory system is unable to service memory requests due to refresh activity. Number of rows refreshed in a rank during a single refresh command = 16 rows per bank * 16 banks per rank = 256 rows = 2 8 rows Total number of rows in a rank = Rank capacity/row capacity = 32 GBytes/4Kbytes = 2 35 /2 12 = 2 23 Number of refresh commands needed in a 64ms period = 2 23 / 2 8 = 2 15 Therefore t REFI = 64ms / 2 15 = 1.95microseconds Fraction of times for which memory system is unavailable due to refresh = trfc / trefi = 400ns / 1.95microseconds = 20.5%