High-Performance Memory Interfaces Made Easy

Similar documents
Creating High-Speed Memory Interfaces with Virtex-II and Virtex-II Pro FPGAs Author: Nagesh Gupta, Maria George

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Synthesizable CIO DDR RLDRAM II Controller for Virtex-II Pro FPGAs Author: Rodrigo Angel

Achieving High Performance DDR3 Data Rates in Virtex-7 and Kintex-7 FPGAs

Enabling success from the center of technology. Interfacing FPGAs to Memory

Interfacing RLDRAM II with Stratix II, Stratix,& Stratix GX Devices

Control Your QDR Designs

High-Performance DDR2 SDRAM Interface in Virtex-5 Devices Authors: Karthi Palanisamy and Maria George

QDR II SRAM Interface for Virtex-4 Devices Author: Derek Curd

Mixed Signal Verification of an FPGA-Embedded DDR3 SDRAM Memory Controller using ADMS

Interfacing FPGAs with High Speed Memory Devices

SigmaRAM Echo Clocks

High-Performance DDR3 SDRAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba

High Performance DDR4 interfaces with FPGA Flexibility. Adrian Cosoroaba and Terry Magee Xilinx, Inc.

8. Migrating Stratix II Device Resources to HardCopy II Devices

11. Analyzing Timing of Memory IP

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Note: Closed book no notes or other material allowed, no calculators or other electronic devices.

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Analyzing Timing of Memory IP

7 Series FPGAs Memory Interface Solutions (v1.9)

Field Programmable Gate Array (FPGA) Devices

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

DDR2 Controller Using Virtex-4 Devices Author: Tze Yi Yeoh

Xilinx Memory Interface Generator (MIG) User Guide

Xilinx Answer MIG 7 Series DDR3/DDR2 - Hardware Debug Guide

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

ML631 U2 DDR3 MIG Design Creation

High-Performance, Lower-Power Memory Interfaces with UltraScale Architecture FPGAs

ML631 U1 DDR3 MIG Design Creation

Advanced FPGA Design Methodologies with Xilinx Vivado

Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Stratix vs. Virtex-II Pro FPGA Performance Analysis

ENGN 1630: CPLD Simulation Fall ENGN 1630 Fall Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim

Dynamic Phase Alignment for Networking Applications Author: Tze Yi Yeoh

H100 Series FPGA Application Accelerators

440GX Application Note

BGA Crosstalk. Thank you for inviting me to speak at this tech online forum.

Interfacing QDR -II SRAM with Stratix, Stratix II and Stratix GX Devices

Synthesizable CIO DDR RLDRAM II Controller for Virtex-4 FPGAs Author: Benoit Payette

Optimal Management of System Clock Networks

LatticeSC sysclock PLL/DLL User s Guide

Power Considerations in High Performance FPGAs. Abu Eghan, Principal Engineer Xilinx Inc.

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

The Ultimate System Integration Platform. VIRTEX-5 FPGAs

Multi-Drop LVDS with Virtex-E FPGAs

Gate Estimate. Practical (60% util)* (1000's) Max (100% util)* (1000's)

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

Programmable CMOS LVDS Transmitter/Receiver

Designing and Verifying Future High Speed Busses

Interfacing DDR2 SDRAM with Stratix II, Stratix II GX, and Arria GX Devices

Compact 8 in 1 Multi-Instruments SF Series

08 - Address Generator Unit (AGU)

INTRODUCTION TO FPGA ARCHITECTURE

ECE 574: Modeling and Synthesis of Digital Systems using Verilog and VHDL. Fall 2017 Final Exam (6.00 to 8.30pm) Verilog SOLUTIONS

Introduction to Field Programmable Gate Arrays

Digital Blocks Semiconductor IP

DO-254 AXI 7 Series DDRx (Limited) 1.00a Certifiable Data Package (DAL A) General Description. Features. August 29, 2014, Rev. -

Reduce FPGA Power With Automatic Optimization & Power-Efficient Design. Vaughn Betz & Sanjay Rajput

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Hardware Design with VHDL PLDs IV ECE 443

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Integrated Device Technology

Spartan-6 & Virtex-6 FPGA Connectivity Kit FAQ

Lab 3: Xilinx PicoBlaze Flow Lab Targeting Spartan-3E Starter Kit

Introduction to Partial Reconfiguration Methodology

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

6. I/O Features in Stratix IV Devices

7-Series Architecture Overview

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

Memory Challenges. Issues & challenges in memory design: Cost Performance Power Scalability

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

LatticeSC QDRII/II+ SRAM Memory Interface User s Guide

Performance Evolution of DDR3 SDRAM Controller for Communication Networks

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Virtex-6 FPGA ML605 Evaluation Kit FAQ June 24, 2009

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

2015 Paper E2.1: Digital Electronics II

DDR3 DIMM 1867 Interposer For use with Agilent Logic Analyzers

PSEC-4: Review of Architecture, etc. Eric Oberla 27-oct-2012

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Using High-Speed Differential I/O Interfaces

PC-CARD-DAS16/12 Specifications

MAX 10 FPGA Signal Integrity Design Guidelines

External Memory Interface Handbook Volume 2 Section I. Device and Pin Planning

The Challenges of Doing a PCI Design in FPGAs

3.3V ZERO DELAY CLOCK BUFFER

Zero Latency Multiplexing I/O for ASIC Emulation

Structure of Computer Systems. advantage of low latency, read and write operations with auto-precharge are recommended.

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

ni.com High-Speed Digital I/O

Power Consumption in 65 nm FPGAs

INTRODUCTION TO CATAPULT C

PC-CARD-DAS16/16 Specifications

AN 462: Implementing Multiple Memory Interfaces Using the ALTMEMPHY Megafunction

Transcription:

High-Performance Memory Interfaces Made Easy Xilinx 90nm Design Seminar Series: Part IV Xilinx - #1 in 90 nm

We Asked Our Customers: What are your challenges? Shorter design time, faster obsolescence More competition, increasing cost pressure Demanding complexity and performance Power consumption and thermal issues Signal integrity problems caused by faster I/O Interfacing to high-performance memories Today s seminar addresses Memory Interfaces

Agenda Memory Trends Design Challenges and Solutions DDR2 SDRAM Interface Summary

Mainstream Memory Data Rates 2000 Data Rate (Mbps) 1600 1200 800 400 0 1995 Source: Micron SDRAM DDR DDR2 DDR3 2000 2005 2010 Year Mainstream memory data rates doubling every four years

Shrinking Timing Margins Data-valid window shrinks faster than clock period Faster data rates but similar device and system uncertainties Data Rate 400 Mbps (DDR SDRAM) Data Rate 533 Mbps (DDR2 SDRAM) Uncertainties Uncertainties 2.5 ns 1.9 ns Data-valid window (> 2/3 ns) Data-valid window (< 1/3 ns) Interface timing becomes more demanding

Agenda Memory Trends Design Challenges and Solutions DDR2 SDRAM Interface Summary

Memory Interface Design Can be Challenging 1. Timing-critical physical layer Read-data capture Meeting the timing budget with reliable design margins 2. High bandwidth system requirements Data Rate x Bus Width Resolving signal integrity issues Meeting I/O placements and board routing requirements 3. Complex memory controller design Based on more than 300 customer surveys

Xilinx Makes it Easy Virtex-4 FPGA built-in silicon features Chipsync in every I/O Hardware-verified designs for highest performance interfaces DDR2 SDRAM, DDR SDRAM, QDR II SRAM, RLDRAM II Memory Interface Generator (MIG) Generates your custom memory controller and physical layer interface in minutes using hardware verified designs Memory Interface Generator ML461- Development System

Timing-critical Physical Layer Toughest Design Challenges Centering the clock to the data-valid window for read cycles System conditions change the clock-to-data phase shift Data-valid window (<1/3 ns) Data 533 Mbps Uncertainties Phase shift Clock (DQS) 267 MHz Example: 533 Mbps DDR2 SDRAM memory interface Centering the clock to a shrinking data-valid window

Fixed Phase-Shift Delay Method Phase-shift determined at design time not at run time In-system testing required for verification Phase-shift does not adjust for system variations Delays between different clock (DQS) signals Process, voltage, temperature Phase-shift between DQS signals DQS1 DQS2 DQ FPGA Phase Shift Delay DQS1 DQS2 DQ Phase-shift delay error FPGA inputs FPGA Internal Fixed phase-shift delay erodes design margins

Precise Data-to-Clock Centering Unique Virtex-4 FPGA solution with ChipSync IDELAY Run time centering of data-to-clock 64 tap delays with 78 ps resolution Maximizing design margins for higher system reliability DQ ChipSync IDELAY INC/DEC IDELAY CNTL Virtex-4 FPGA FPGA Fabric State Machine Data-valid window (<1/3 ns) Data 533 Mbps Clock 267 MHz Leading and trailing edge uncertainties 78 ps resolution Not available in any other FPGA, ASIC or ASSP

High Bandwidth System Requirements Maximizing bandwidth : Data Rate x Bus Width Resolving signal integrity issues Meeting I/O placement and board routing requirements

Data Rate x Bus Width Any I/O can be used for Data, Strobe/Clock or Address/Controls Chipsync built in every I/O Any Data to Strobe ratio from x4 to x36 is supported Data rates of 600 Mbps for single-ended I/O standards Up to 259 Gbps data bandwidth using 432 Data bits Superior SSO performance with innovative package design Virtex-4 Device Pkg Max Data width LX25, LX40, LX60 FF668 144 LX40, LX60 FF1148 288 LX80, LX100, LX160 FF1148 360 LX100, LX160, LX200 FF1513 432 3 x higher bandwidth than competing solutions

Achieving Superior SSO Performance Column-based architecture and SparseChevron package Better power and ground distribution Virtex-4 FF1148 Stratix II F1020 Returns spread evenly Many regions devoid of returns Better SSO performance than competing solutions

Accumulating Test Comparison Virtex-4 1.5 volt LVCMOS 4 ma, fast 68 mv p-p (Virtex-4) 100 mv/div 474 mv P-P (Stratix-II) Stratix-II 1.5 volt LVCMOS 4 ma (no speed option) Aggressors building up Waveforms offset for visual clarity 400 ns/div 100 aggressors shown Aggressors shutting down Tek TDS6804B Source: Dr. Howard Johnson 7 x less crosstalk than competing solutions

Meeting I/O Placement Requirements Virtex-4 FF1148 Stratix II F1020 Unrestricted I/O placements Memory I/Os restricted to top and bottom 3 x more data I/Os than competing solutions

Multiple Designs Simulated and Verified in Hardware ML461 Development System 4 x LX25 devices supporting multiple memory interfaces JTAG interface for ChipScope Pro (in circuit logic analyzer) Reference designs verified on it Available now www.xilinx.com/ml461 DDR2 DDR QDR II RLDRAM II FCRAM II Data rate 533 Mbps 400 Mbps 1.2 Gbps 600 Mbps 600 Mbps CLK Rate 267 MHz 200 MHz 300 MHz 300 MHz 300 MHz Data Width 144 bit (DIMM) 144 bit (DIMM) (72+72) bit 36 bit 36 bit

Complex Memory Controller Design Controller state machine varies with memory architecture and system parameters State machine code to maximize performance involves: Architecture (DDR, DDR2, RLDRAM II, QDR II) Number of Banks (external and internal to the memory device) Data Bus Width (32, 36, 72, etc.) Device width (x4, x8, x9, x16, x18, etc.) Data to Strobe ratios Bank and Page access algorithm Costly and time-consuming to implement

Memory Interface Generator Makes Design Easy Generates : HDL code Constraints file Synthesizable test bench FREE Available now www.xilinx.com/memory User-friendly GUI Design your controller with complete flexibility

Generates Your Memory Design in Minutes Design Flow Download and install the Memory Interface Generator Run MIG, choose your memory parameters and generate rtl and ucf files Import MIG files into ISE project Synthesize the design (XST) Place and route the design Customize MIG design Timing simulation Perform functional simulation Verify in hardware Optional HDL customization

Memory Interface Generator Options to select FPGA device, package, speed grade Banks and signals Memory Interface clock freq. Memory architecture and devices (DIMM modules) Data (bus) width and depth

Memory Interface Generator Outputs generated from a library of hardware-verified designs Constraints file (ucf) Modular HDL files (rtl) Physical layer (IDELAY) Controller state machine User Interface Read FIFO Write FIFO Synthesizable test bench Complete visibility to the HDL code Option to further customize the memory controller

Agenda Memory Trends Design Challenges and Solutions DDR2 SDRAM Interface Summary

267MHz(533 Mbps) DDR2 Interface Physical Layer interface Read data and strobe/clock received edge aligned from memory device Write data and strobe/clock transmitted center aligned with each other Controller User Interface System Clock DCM User Interface FPGA_clk FPGA_clk Addrs/ctrl 267 MHz DDR2 Controller State machine Write Data Read Data Virtex-4 FPGA FPGA_clk ctrl Addrs/ctrl CLKs 534 Mbps DDR2 physical layer interface Data(143:0) DQS(13:0) DDR2 SDRAM 267MHz 144 bit Interface DQS DQS DQ DQ DATA READ DATA WRITE

Physical Layer - Read Traditionally challenging read data physical layer is made easier with Virtex-4 Direct Clocking Technique DQS memory strobe used to determine data delay value Read data delayed to center align to FPGA clock, CLK0 Read data captured directly in CLK0 domain using Input DDR Flip Flops Data-to-clock centering for every DQS signal

Data-to-Clock Centering Executed at run time during initialization Memory strobe input to IDELAY block (part of ChipSync) set to 0 taps Output of IDELAY block registered using FPGA clock, CLK0, for edge detection Number of taps required to detect first edge recorded Number of taps required to detect second edge recorded Difference between first edge taps and second edge taps is pulse width Required data delay is sum of first edge taps and pulse center Data-to-clock centering for every DQS signal

Data-to-Clock Centering second edge detected first edge detected Clock / Strobe Read Data first edge taps second edge taps data delay taps data delay taps Center Aligned Delayed Read Data Internal FPGA Clock Run time data-to-clock centering during initialization

Memory Strobe IO Diagram DQS IDELAY 64 taps Absolute Delay line D Q Edge Detection and Control Logic Data Delay Tap Count Data IDELAY Tap Control Logic IO IDELAY Increment/ Decrement Logic Read DQ IDELAY Increment/ Decrement Logic ChipSync FPGA Clock DQS to FPGA clock phase shift detection

Read Data Path Diagram DQ IDELAY 64 taps Absolute Delay line Input DDR Flip Flops Read Data FIFO Rising Edge User Interface IO IDELAY Increment/ Decrement Logic CLK0 Read Data FIFO Falling Edge ChipSync Tri-ctrl Write Data Rising Data IDELAY Tap Control Logic Data Delay Tap Count OBUFT Output DDR Flip Flops Write Data Falling CLK270 Data inputs delayed and captured in FPGA clock domain

Timing Margin Analysis Data-valid window is determined by system and device uncertainties Data 533 Mbps Clock 267 MHz Leading Edge Uncertainties Trailing Edge Uncertainties 0 Start window End Data window Period

DDR2 Read Timing @ 267 MHz Parameters Value(ps) Uncertainities before DQS Uncertainities after DQS Tclock 3750 Clock period Tmemory_dll_duty _cycle_dist 188 188 188 Tdata_period 1687 Memory uncertainties(tac) 500 500 500 Tpackage_skew 30 15 15 Package skew Tsetup - Min 0 0 0 Thold - Max 0 0 0 Meaning Duty cycle distortion from memory DLL is subtracted from half clock period to determine Tdata_period. Data period is half the clock period with 10% duty cycle distortion subtracted from it. This parameter considers the worst of all the memory parameters. Tac = 500ps for 267 MHz. DQS edge detection is performed by registering it in the IO flip flop with a global clock. The final data delay value therefore already accounts for the setup and hold times of the IO flip flops. Hence these parameters are not considered in this Tjitter 100 100 100 Clock jitter that indirectly causes strobe jitter Tclock_tree_skew - Max 50 50 50 Tpcb_layout_skew 20 20 20 Uncertainties 685 685 Window 317 685 1002 Small value considered for Skew on "global clock" line since detection of DQS and associated DQ are placed close to each other Skew between data lines and associated strobe on the board Data-valid window is more than 2 x the 78 ps tap resolution

DDR2 Read Timing @ 267 MHz Clock centering to data-valid window performed at run time with 78 ps resolution Data-valid window is determined by system and device uncertainties Data 533 Mbps Clock 267 MHz Leading Edge Uncertainties Trailing Edge Uncertainties 0 685 1002 1687 Data-valid window is more than 2 x the 78 ps tap resolution

Write Data Path Write data path easy to implement due to DCM and Output DDR Write strobe/clock must be center aligned with data Write strobe/clock generated using Output DDR clocked by CLK0 DCM output Write data transmitted using Output DDR clocked by CLK270 DCM output DQS OBUFT DQ OBUFT Tri-ctrl Tri-ctrl Output DDR Flip Flops Output DDR Flip Flops 1 0 CLK0 Write Data Rising Edge Write Data Falling Edge CLK270 DQS transmitted using CLK0 and its inverse DQ transmitted using CLK270 and its inverse Data outputs center aligned with DQS

Parameters DDR2 Write Timing @ 267 MHz Value(ps) Uncertainities before DQS Uncertainities after DQS Tclock 3750 Clock period Tmemory_dll_ duty_cycle_di st 188 188 188 Tdata_period 1687 Tsetup 100 100 0 Thold 225 0 225 Tpackage_sk ew 30 15 15 Package skew Meaning Duty cycle distortion from memory DLL subtracted from half clock period to determine Tdata_period. Data period is half the clock period with 10% duty cycle distortion subtracted from it. Only the worst case parameter Tac being considered. Only the worst case parameter Tac being considered. Tjitter 0 0 0 Same DCM used to generate DQS and DQ Tclock_tree_s kew - Max 50 50 50 Tclock_out_p hase 140 140 140 Tpcb_layout_ skew 20 20 20 Uncertainties 325 450 Window 912 325 1237 Small value considered for Skew on "global clock" line since detection of DQS and associated DQ are placed close to each other Phase offset error between different clock outputs of the same DCM. Skew between data lines and associated strobe on the board Write data-valid window provides a comfortable margin

Agenda Memory Trends Design Challenges and Solutions DDR2 SDRAM Interface Summary

Memory Interface Design Challenges Solved 1. Timing critical physical layer Chipsync built in every I/O Clock-to-data centering at run time 2. High bandwidth system requirements Column-based architecture and superior packaging 600Mbps x 432 bit wide buses 3. Complex memory controller design Hardware verified solutions for all popular memory types (DDR2, DDR SDRAM, QDR II SRAM, RLDRAM II) Memory Interface Generator (MIG) Generates your design in minutes FREE Virtex-4 Wins! Memory Interfaces Made Easy

How to Get Started Leverage the complete hardware verified solutions to assure first time design success Access latest Virtex-4 memory design solutions on www.xilinx.com/memory Memory Interface Generator (MIG) FREE Application Notes ML461 - Advanced Memory Development System Board level solution including: reference designs, schematic & gerber files DDR2, DDR SDRAM, QDR II SRAM,RLDRAM II, FCRAM II Contact your local FAE for an on-site demo Accelerate Your Design Cycle