Designing High-Speed Memory Subsystem using DDR Cuong Nguyen Field Application Engineer cuong@edadirect.com www.edadirect.com 2014 1
Your Design for Excellence Partner Since 1997 EDA Direct has helped customers in the High Tech industry with design tools, training, and services 2014 2
Best in Class Design Tools we support PADS PCB Layout DxDesigner Schematic Capture Expedition PCB Enterprise System Hyperlynx Signal/Power Integrity Modelsim/Questa FPGA/ASIC/System Simulation FloTHERM CFD Thermal Analysis Valor PCB Manufacturing/Assy/Test 2014 3
Agenda DDRx Overview Design Considerations Integrated Design Flow DDR4 New Drive Standards Variable Vref calculation Eye Generation/Measurements New features Power Integrity Effects 2014 4
DDR Feature Comparison DDR DDR2 DDR3 Data Rate 200-400 Mbps 400 800 Mbps 800 1600 Mbps System support 4 slots 8 loads 2 slots 4 loads 2 slots 4 loads 1600 Mbps = 1 slot Signaling Technology DQS signals SSTL_2 SSTL_18 SSTL_15 Bi-directional single ended Bi-directional single or differential Bi-directional differential ODT No Yes Yes Signal leveling No No Yes 2014 5
Overview DDR3 vs. DDR4 DDR3 DDR4 Comments VDD/VDDQ/VPP 1.5/1.5/NA 1.2/1.2/2.5 Up to 20% power saving (1.35/1.35/NA) Clock Frequencies 400-1600MHz 800-1600MHz+ Higher BW CAS Latency 5~14 9~24 Vref VDDQ/2 (Ext) Internal DQ Validation Setup/Hold Data Eye Borrowed from SERDES Data Termination VDDQ/2 (VTT) VDDQ Asymmetric Term. Add/Cmd/Termination VDDQ/2 (VTT) VDDQ/2 I/O Standard SSTL15 POD12 Power savings on 1 bits On Chip Error Detection No Parity (Cmd/Add) Server Class CRC (DQ) Bank Grouping No 4 Ping-Pong for efficient use 2014 6
Overview LPDDR3 vs LPDDR4 LPDDR3 LPDDR4 Comments CLK 400-800MHz 800-1600MHz 2x speed (possibly more) Bandwidth 12.8GB/s (2ch) 25.6GB/s (2ch) Higher BW VDD2/VDDQ/VDD1 1.2/1.2/1.8 1.1/1.1/1.8 Power reduced 10% I/O Interface HSUL LVSTL 40% I/O Power reduction DQ ODT Vtt Term VSSQ Term vs. POD CA ODT No term VSS Term (optional) Vref External Internal 2014 7
Overview Speed related Eye challenges Semicon West: High Performance & Low Power Memory Trends SK Hynix 2014 8
Design Considerations Layout/Route Avoid crossing splits in reference planes (discontinuities) Minimize Inter Symbol Interference (ISI) using matched Impedances Minimize Crosstalk by isolating sensitive bits (ie. Strobes) Match traces within byte lanes (DQ, DM, DQS) to minimize skews Power Supplies Use precision resistors for V REF Short/Wide traces to minimize L and loss 15~25mil clearance from V REF to adjacent traces to minimize coupling Decouple high frequency Power Supply noise w/caps Signaling DQ Driver Impedance Matching with proper drive strengths ODT is a must for better Signal Integrity (if not used then use T-brances or dumping resistor to minimize reflections Choose termination carefully to balance power consumption, signal swing, and reflection Use 2T timing for Address/Command 2014 9
PreLayout SI Simulation From-Concept-to-Chip-to-Board www.edadirect.com 2014 10
Sketch Routing with PADS (video) Constraint-driven routing Designer knows where the traces should be routed Precise location of traces & vias Control style of routing incl serpentines Things to observe while routing DDRx Width & clearance rules Placement intentions Netline organization Layer restrictions Pad/via entry rules (angle, size) Diff pair rules 2014 11 11
Using Sketch routing with DDR3 (video) 2014 12
DDR4 Features 2014 13
New Drive Standards DDR 4 - Pseudo Open Drain Address/Command/Control (SSTL) LPDDR4 - Low Voltage Swing Terminated Logic Both Data and Address New standard only effects IBIS files Effects of new standards visible in simulation results 2014 14
New Drive Standards Difference Termination DDR4 Terminated to VDDQ DDR3 Terminated to VDDQ/2 or VTT 2014 15
New Drive Standards LPDDR4 LVSTL for LPDDR4 terminates to GND 2014 16
New POD Drive Standards Why? Current still flows when driving Low I DDR4 (ie. More) I DDR3 2014 17
New Drive Standards Why? No current draw when driving a High No Current DDR4 I DDR3 2014 18
Power Savings with DBI (Data Bit Inversion) Ensure more 1 s than 0 s with POD If more than 4-bits in a byte are 0, toggle bits and set DBI = 0 DBI shared with DM => only one feature enabled DBI pin is I/O (Reads and Writes) Signal transitioning from one byte to next is less than 4-bits! Minimize SSN! 2014 19
Implications for SI What does Asymmetric Termination mean for reference threshold voltage ( center voltage of eye ) Take average voltage of high and low at receiver Assume a very short trace for sake of simple calculations Assume balanced High and Low drive strength Rt = Term Resistance Zo = Transmission Line Impedance Zd = Driver output Impedance 2014 21
DDR3 Eye Center Voltage When Driving High When Driving Low = = 2 + 2 + + + 2 + = 2 + = + 2 2 = 2 = = 2 2 + + + 2 2 +2 + 2 + 2 2014 22
DDR4 Center Voltage When Driving High When Driving Low = = + = + 2 + = + 2 = 2 + 2 + Threshold value is NOT constant and depends on the setup! 2014 23
DDR3 vs. DDR4 Vcent Comparison DDR3, Vcent = Vdd/2. Constant, regardless of setup DDR4, Vcent = f(rt, Zd) Varies from setup to setup Varies from read to write Varies across access to different DRAMs Always greater than Vdd/2 2014 24
Simple Comparison Controller -> 50 Ohm T-Line -> DRAM Vary DRAM s ODT to see center level of eye 2400Mbps Data rate 2014 25
DDR3 sweeping Rx ODT No change in Center of Eye with ODT 2014 26
DDR4 Eye shifting Eye center shifts with ODT change 2014 27
LPDDR4 Eye shifting Eye center shifts with ODT change 2014 28
Input Threshold Levels DDR3 Absolute Vinh/vinl thresholds stayed constant in DDR3 across designs DDR4 Absolute Vih/Vil threshold levels can change in DDR4 2014 29
Vref for a Device Need Precision programmable reference voltage Center voltage can vary across pins Individual Vref per pin may be too expensive Silicon and power How to determine voltage used for all pins? 2014 30
Determining Vref Avg 8 bit device, one Vref resource (Assume) Eye-center of bit different for each bit 800mV 750mV 730mV 725mV 720mV 710mV 705mV 700mV Option 1: Average of all signals 800 + 750 + 730 + 725 + 720 + 710 + 705 + 700 8 = Option 2: Average of extreme signals 800 + 700 2 = 2014 31
Average Margin Loss Comparison Option 1: Average of all signals Option 2: Average of extreme signals 730+68 = 798mV (Vinh) 730-68 = 662mV (Vinl) 750+68 = 818mV (Vinh) 750-68 = 682mV (Vinl) 2014 36
Average Margin Loss Comparison 800mV 750mV 730mV 725mV 720mV 710mV 705mV 700mV Assume eye height same at all pins, Eye height of 260mV (Vcenter +/- 130mV) (Not strictly true, but 800mV is still worst case) 800mV center signal swings between 670mV to 930mV 930mV 798 730 662 Option 1 730+68 = 798mV (Vinh) 730-68 = 662mV (Vinl) Problems with Low 818 800mV 750 682 670mV Option 2 750+68 = 818mV (Vinh) 750-68 = 682mV (Vinl) Better Margins High/Low 2014 37
Vref for DDR4 DRAM (JEDEC) Single Voltage, regardless of x4, x8 or x16 Ref Voltage is average of Min and Max center-voltages JC-42.3C, Ballot 1727.84A 2014 39
Vref for DDR4 DRAM Programmable Range from 45% of Vdd to 92.5% of Vdd Step size can be between 0.5%Vddq to 0.8%Vddq Exact value will be specific to the DRAM and should be in the datasheet 2014 40
Vref for LPDDR4 DRAM Separate Data and CA reference voltages 2014 41
Vref for Controller (Read) Single Vref for entire channel Inexpensive, easy for controller to implement Greater similarity required in DQ signal characteristics 2014 43
Vref for Controller (Read) Single Vref for each DRAM in the rank Allows for grouping of signals going to each DRAM 2014 44
Vref for Controller (Read) Single Vref for each Lane in the rank Allows for grouping of signals going to each Lane Differs from previous setup only for x16 DRAMs 2014 45
Vref for Controller (Read) Single vs Multi-Rank support Cost vs Layout feasibility 2014 46
In the DDR Wizard Controller Vref Controller can have Vref per lane, DRAM or single Vref for all DQ signals Can have single Vref for all ranks, or separate Vref per rank Overide defaults 2014 47
Eye Generation Concept of Eye borrowed from SerDes No self-clocking, or clock recovery scheme Uses explicit Clock/Strobe (Source Synchronous) In simulation run, cannot just specify frequency Must incorporate clock/strobe 2014 48
DDR4 Data Mask (JEDEC) Eye Mask = Vertically centered at Threshold and Horizontally centered around Strobe zero crossings Data compliance determined by Eye-Mask Address still using setup/hold as with DDR3 (SSTL) Random region definition still being decided (VIHL_AC) o Value is zero for now o May change for higher speeds 2014 52
LPDDR4 Mask Both Data and CA determined by Eye-Mask 2014 53
DDR4 AC Threshold VIHL_AC measured on both high and low sides 2014 54
In the DDR Wizard Results Aggregate results and per-bit results Aggregate contains eye-level results and worst-case results Per-bit results contain measurements for every clock cycle ResAMI_data_aggregated.sds file shows eye-diagrams and aggregate measurements Eye mask centered around Vref and DQS zero-crossing 2014 55
New DDR4 Features 2014 56
New Features - Calibration DDR4 now enables per-bit training/calibration Enables DQ delay calibrations to DQS Help align mismatch signals within a lane DDR4 Continues to support Write leveling LPDDR4 continues to support CA and DQ training 2014 57
New Feature RTT_PARK Used when ODT is de-asserted, or disabled DDR3, this would be a Hi-Z DDR4, this triggers RTT_PARK (Optional) Eg. Writing to rank 1 May need weak termination in rank 3, Hi-Z in Rank2 Useful in multi-rank systems 2014 59
In the DDR Wizard RTT_PARK RTT_NOM, RTT_WR, RTT_PARK not directly mapped ODT Disabled, ODT1 Enabled, ODT2 Enabled Balance between corner case of using all ODTs vs. ease of use 2014 60
New Features DDR4 Par/CRC Reliability Features PAR bit for Address Detect even parity error in Address lines Execute command only if no error CRC for DQ writes (optional) 8-bit CRC for 72-bits of data Controller generates data + CRC block Combined data passed in specified order DRAM executes CRC check, and signals error using ALERT_n pin Useful for Server Market! 2014 61
New Feature Bank Group Feature reinstated since Core DRAM frequency stays relatively fixed and the need to reduce pre-fetch size of DDR4 Pre-fetch delay for consecutive accesses within a Bank Group is greater than between groups Need to store data across groups so that data is ping-ponged across groups to optimize timing More firmware/software task than SI 2014 62
New Feature Gear down Mode For speeds at 2666MT/s and above, reduce DRAM internal clock speed Address/Command runs at ¼ rate of data Help in Addr/Cmd signaling in multi-rank cases 2014 63
Power Integrity Effects 2014 64
Signal Integrity/Power Integrity Co-Simulation Use SI/PI Co-Simulation to see effects of VIA structures and bypassing/decoupling on sensitive signals (ie. DQS/CLK) Coupling between VIAs, stub length effects, etc.. Through-hole, buried, stacked, etc.. Do what-if analysis on PDN to fix problems 2014 65
SI/PI Co-Simulation (video) 2014 66
Power Integrity/Thermal Integrity Co-simulation Analyze Joule Heating effects in traces Identify/localize sensitive area on board Minimize field failures and increase product reliability 2014 67
Power Integrity Effects - Ripples Lane being driven between two components 7-bits driving PRBS, 1-bit at 1 2014 68
Power Integrity Effects - Ripples 150mV noise from switching of other bits in lane 2014 69
Integrated Design Flow - Summary Tight tool integration (no export/import) Single debug environment Minimize errors, increase accuracy Single Support Source SI/PI/Thermal Simulation Pre/Post Architectural/Technology Investigation Performance/Feasibility Analysis Noise/Crosstalk/Termination/EMI Constraints Constraints Stackup/Placements Length/Impedance/via Board DDR Schematic Library/Symbols Design Variants Layout/Route Design Constraints Minimize errors and increase product reliability! 2014 70
Contact Info EDA Direct Inc. (408) 496-5890 (888) 669-9EDA To enroll for our upcoming seminars, workshops, training classes, and view demo videos of our products visit www.edadirect.com 2014 71