Microprocessor System Design Lecture 7: Memory Modules Error Correcting Codes Memory Controllers Zeshan Chishti Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F.
Memory Modules 184 pin DDR SDRAM DIMM All chips in a rank receive same address and control signals Each chip responsible for subset of data bits in its rank Module acts as high capacity DRAM with wide data path Example: 8 chips, each 8 bits wide = 64 bits Easy to add/replace memory in a system No need to solder or remove individual chips Memory granularity issue What s the smallest increment in memory size? From Hsien-Hsin Sean Lee, Georgia Institute of Technology
DRAM Ranks
Organization of DRAM Modules
Memory Modules SIMM (Single Inline Memory Module) 30-pin: some 286, most 386, some 486 systems Page Mode, Fast Page mode devices 72-pin: some 386, most 486, nearly all Pentium (before DIMM) Fast Page Mode, EDO devices DIMM (Dual Inline Memory Module) Dominant today SODIMM (Small Outline DIMM) Used in notebooks, Apple imac RIMM (Rambus RDRAM Module) SIMM 168 pin SDRAM DIMM 184 pin DDR SDRAM DIMM SODIMM 240 pin DDR2, DDR3 SDRAM DIMM 200 pin DDR2, DDR3 SDRAM DIMM RIMM RIMM
SPD (Serial Presence Detect) 8-pin serial EEPROM on memory module Key parameters for SDRAM controller Number of row/column addresses Number of ranks Module width Refresh rate/type Error checking (none, parity, ECC) Latency Timing parameters
DRAM and DIMM Nomenclature Device name Clock M transfers per sec MB/sec Per DIMM DIMM name DDR200 100 MHz 200 1,600 MB/s PC-1600 DDR266 133 MHz 266 2,133 MB/s PC-2100 DDR333 166 MHz 333 2,666 MB/s PC-2700 DDR400 200 MHz 400 3,200 MB/s PC-3200 DDR2-400 200 MHz 400 3,200 MB/s PC2-3200 DDR2-533 266 MHz 533 4,266 MB/s PC2-4200 DDR2-667 333 MHz 666 5,333 MB/s PC2-5300 DDR2-800 400 MHz 800 6,400 MB/s PC2-6400 DDR2-1066 533 MHz 1066 8,533 MB/s PC2-8500 DDR3-800 400 MHz 800 6,400 MB/s PC3-6400 DDR3-1066 533 MHz 1066 8,500 MB/s PC3-8500 DDR3-1333 666 MHz 1333 10,666 MB/s PC3-10600 DDR3-1600 800 MHz 1600 12,800 MB/s PC3-12800 DDR3-1866 933 MHZ 1866 14928 MB/s PC3-14900 M transfers/second = 2 transfers (DDR) x Clock Rate DRAM name incorporates M transfers per second MB/sec = 8 bytes x M transfers per second DIMM name incorporates MB/sec (rounded)
DRAM/SDRAM Latency Specifications DRAM Used 4 numbers (e.g. 4-1-1-1) Indicates number of CPU cycles for 1st and successive accesses SDRAM CAS Latency (CAS or CL) Delay in clock cycles between request and the time the first data is available PC133 module might be described as CAS-2, CAS=2, CL2, CL-2, or CL=2 SDR-DRAM CAS Latency of 1, 2, or 3 DDR-DRAM CAS Latency of 2 or 2.5 When three numbers appear (e.g. 3-2-2) CAS Latency (tcac) RAS-to-CAS delay (trcd) RAS pre-charge time (trp) DDR3 seeing use of four numbers CAS Latency ( tcas tcl, CL) RAS-to-CAS delay (trcd) RAS pre-charge time (trp) RAS access time (tras) 3-3-3-10 timing
Key SDRAM Timing Parameters Determines Latency: t RCD : Minimum time between an ACTIVE command and READ command CL (CAS Latency): Time between READ command and first data valid Determines Bandwidth: t RC: Time between successive row access to different rows (t RC = t RAS + t RP) t RAS : Time between ACTIVE command and end of restoration of data in DRAM array t RP: Time to pre-charge DRAM array in preparation for another row access
EX: Comparing Performance of DIMMs Parameter SDRAM PC3-12800 PC3-14900 DIMM Spec DIMM Clock Period T CK 1/800Mhz = 1/933Mhz = 1.07ns 1.25ns CAS Latency CL 9 9 RAS-to-CAS Delay T RCD 9 9 RAS pre-charge time T RP 9 9 RAS access time T RAS 27 27 Cost/pair $ 176 196 Best Bandwidth/$: t RC = t RAS + t RP = 27 + 9 = 36 (for both DIMMs) 14900/12800 = 1.16, 196/176 = 1.11 so 16% bandwidth gain, 11% increase in cost I d buy the PC3-14900 DIMMs Time from ACTIVE to end of cycle: Time to first byte (Latency) for PC3-12800 = T RCD + CL = 9 + 9 = 18 Time to get 8 bytes of data (burst size = 8, DDR) = 4 Total time = (18 + 4) * 1.25ns = 27.5ns
DDR4 JEDEC released standard September 2012 Projected to be ~50% of market by 2015-2016 Hynix announced 128 GB module using 8 Gb DDR4 in April 2014 AMD (Hierofalcon), Intel (Haswell-E) supporting DDR4 in 2014 No longer multi-drop point-to-point with single DIMM per channel 284-pin DIMM interface
Error Correcting Codes
Error Correction Motivation Failures/time proportional to number of bits As DRAM cells size & voltages shrink, more vulnerable Why was/is this not issue on your PC? Failure rate was low Few consumers would know what to do anyway DRAM banks too large so much memory that not likely to encounter an error Servers (always) correct memory system errors (e.g. usually use ECC) Sources Alpha particles (impurities in IC manufacturing) Cosmic rays (vary with altitude) Bigger problem in Denver and on space-bound electronics Noise Need to handle failures throughout memory subsystem DRAM chips, module, bus DRAM chips don t incorporate ECC Store the ECC bits in DRAM alongside the data bits Chipset (or integrated controller) handles ECC
Error Detection: Parity [from Bruce Jacob]
Error Correction Codes (ECC) Single bit error correction requires n+1 check bits for 2 n data bits
Error Correction Codes (ECC) =1^0^0^0 = 1 1
Error Correction Codes (ECC) Sent -> Recv d -> An example: decoding and verifying 1 1 =1^0^0^0 = 1 R 1011 1 1
Error Correction Codes (ECC) Add another check bit SECDED Single Error Correction Double Error Detection requires n+2 check bits for 2 n data bits
Error Correction Codes (ECC) 64-bit data path + 8 bits ECC stored to DRAM module [from Bruce Jacob]
Memory Controllers
Memory Controllers Handle the actual interface to memory Determine memory configuration/capability Memory Timing/Signal interface Address Mapping Physical Address to Memory Topology Error Correction Scheduling Refresh WAS in North Bridge of chipset Intel prior to Nehalem MCH (Memory Controller Hub) Isolates mp from memory technology/device changes IS Integrated with microprocessor AMD, Intel Nehalem Low latency for high performance Opens possibility for processor-directed hints
Address Mapping Dual channels Memory module Channel ID Rank Row Bank Column
Address Mapping (cont d) Dual channels Memory module Channel ID Rank Row Bank Column Channel Physical path between CPU and memory Rank Group of DRAM chips operating in lockstep Same address, control, CS Responsible for subset of same word Bank Set of independent memory arrays in DRAM chip Row/Column Address of bit cell in a bank May be several planes to achieve n bits wide
Memory Scheduling Memory transactions: read, write DRAM commands: refresh, activate, read, write, precharge Memory scheduling policy Handle transaction requests Possibly from different cores Refresh Prioritize low/high priority CPU cache line fill request Prefetch Prioritize Read over Write Re-order to take advantage of open page in bank Page policy Open Page Close Page
Memory Scheduling Without access scheduling (56 DRAM cycles) Time (cycles) 01 10 20 30 40 50 56 (0,0,0) (0,1,0) (0,0,1) (0,1,3) (1,0,0) (1,1,1) (1,0,1) (1,1,2) P A C P A C P A C P A C P A C P A C P A C P A C With access scheduling (19 DRAM cycles) 01 10 20 (0,0,0) P A C (0,1,0) P A C (0,0,1) C C (0,1,3) (1,0,0) P A C (1,1,1) P A C (1,0,1) C (1,1,2) C (bank,row,col) DRAM commands P: bank precharge (3 cycles) A: row activation (3 cycles) C: column access (1 cycle)
Memory Access to Idle Bank
Memory Access to Active Page (Open Bank)
Memory Access to New Page (Open Bank)
Open page vs.close page policy Open page policy: Row hit latency: t CL +t BURST Row miss latency: t RP + t RCD + t CL + t BURST Close page policy: Row is closed after every access => no row hits Latency: t RCD + t CL + t BURST (slower than open page row hits but faster than open page row misses) Assume than n% of the accesses are row hits with open page policy, then the break-even point for leaving the page open (or close) will be: tr CD + t CL = (n * t CL ) + ((1 n) * (t RP +t RCD + t CL )) n = t RP / (t RP + t RCD )