Designing 3D Tree-based FPGA TSV Count Minimization V. Pangracious, Z. Marrakchi, H. Mehrez UPMC Sorbonne University Paris VI, France 13 avril 2013
Presentation Outlook Introduction : 3D Tree-based FPGA Architecture 1 Mesh-based and Tree-based FPGA architecture 2 Tree-based FPGA interconnect organization 3 3D Interconnect (TSV), Where to Add and How many? 3D Tree-based FPGA Design and Optimization 1 3D Design and TSV Count Optimization Methodology 2 3D Floorpaln development,timing Analysis 3D Tree-based FPGA Experimental Analysis 1 TSV Count Reduction and Performance Analysis 2 3D Tree-based FPGA Architecture Optimization 3 Interconnect Power Estimation 2/23
Industrial FPGA Architecture Mesh-based FPGA : Industrial Architecture Wire Segments S C S C S C S Configurable Logic Block (CLB) C CLB C CLB C CLB S C S C S C S C CLB C CLB C CLB C S C S C S C S C 3 2 1 0 0 1 2 3 3 2 1 0 C CLB C CLB C CLB C S C S C S C S 0 1 2 3 Switch Block Detail Connection Block Most common Academic and Industrial Architecture 3/23
2D FPGA Statistics 2-Dimensional Mesh-based FPGA Statistics 1 Programmable Interconnects occupy 90% of the FPGA Area. 2 Contributes roughly 80% of the total path delay. 3 Contributes more than 60% of the total dynamic power consumption. 4 As a result, FPGA performance is significantly worse in terms of logic density, delay and power consumption compared to cell based ASICs. 5 Research studies have estimated FPGAs to be more than 10 times less efficient in logic density, 3 times larger in delay and 3 times higher in power consumption compared ASICs 4/23
FPGA Architecture A Novel High Density Tree-based FPGA Architecture To Level 2 To Level 2 To Level 2 To Level 2 Cluster Level 1 Cluster Level 1 Cluster Level 1 DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB UMSB UMSB UMSB UMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB OUT UMSB Cluster Level 0 UMSB UMSB UMSB IN Pads LB LB LB LB LB LB LB LB LB LB LB LB LB LB LB LB OUT Pads An integrated upward and downward unidirectional programmable interconnect network using Butterfly-fat-tree network topology 5/23
Tree-Based FPGA Interconnect Network Organization Upward Interconnection Upward Mini Switch Blocks Full crossbar switch nbumsb(l) = N in (l 1) Downward Interconnection Downward Mini Switch Blocks Full crossbar switch log k (N) N switch (down) = N (k p c in + kc out ) log k (N) N switch (up) = N k c out nbdmsb(l) = N in (l 1) l=1 l=1 k (p 1)(l 1) k (p 1)(l 1) N is the total number of logic blocks, c in and c out are the number of inputs and outputs of logic blocks, k is the arity, and p and l are the Rent s parameter and level. 6/23
Tree-based Interconnect Length Wire length estimation of 2D Tree-based FPGA layout 2 Dimesional Layout Analysis Wire length (µm) Tree interconnect levels Wire length increases exponentially as Tree grows to higher levels 7/23
3D Stacked Tree-based FPGA Horizontal Partitioning of Tree Interconnect Level 2 S Level 2 Horizontal Tree Break Point Horizontal Tree Break Point Switch blocks Level 1 S Level 1 Switch blocks S Switch blocks S Level 1 S Level 1 Level 0 Level 0 Level 0 Level 0 Level 0 s s s s s s s s s s s s s s s s Logic Blocks Logic Blocks Logic Blocks Logic Blocks The network partitioning and location of the break-point is decided based on interconnect delay optimization. 8/23
3-Dimensional Layout Analysis 3D Layout Delay estimation Downward Network Feedback Network Upward Network DMSB Level 1 DMSB Level 1 MX1 MX2 MX3 MX4 UMSB MX1 MX2 MX3 MX4 UMSB IO Nets Downward Network Level 0 DMSB DMSB DMSB Level 0 Level 0 MX1 MX2 MX3 MX4 MX1 MX2 MX3 MX4 MX1 MX2 MX3 MX4 Output Pads Downward Network IO Nets Upward Network UMSB Input Pads LUT LUT LUT LUT FF FF FF FF LB LB LB LB Upward Network 1 Mentor s SPICE accurate circuit simulator Eldo. 2 ST Micro s 130nm technology transistor models 9/23
3-Dimensional Layout Analysis 3D Layout Delay measurement setup Horizontal Break Point Delay Results Measured Delay (ns) 3 2.5 2 1.5 1 0.5 0 14 12 10 8 6 4 2 0 3D Delay 3D Delay Break Point Lelve 6 TSV Level 4 Level2 Level 0 &1 Level 5 10 100 1000 10000 100000 L0 2D Delay Level 0 1 & 2 L1 Lelve 6 Level 4 Level3 Level 5 L2 L3 L4 Number of LUTs 2D Delay 10 100 1000 10000 100000 L5 L6 Re-organized active layer 2 (TSV placement) to optimize delay 10/23
3-Dimensional Layout Design 3D compatible Tree-based FPGA Floorplan arrangement LBs & local interconnect network tree level 0,1,2,3 Tree Level 6 interconnect Section 2 of layer 2 Level 5 to 6 local Interconnection TSV TSV Thermal interface Tree Levels 0 to 3 Floorplan Layer 1 TSV Tree Level 5 interconnect Section 3 of layer 2 Interconnect Level 5 TSV Break point Level Section 1 of layer 2 Tree level 4 interconnect TSV Thermal interface Level 4 to 5 local Interconnection 11/23
3-Dimensional Layout Design 3D compatible Layout Design Section 2 Layer 2 Section 1 of layer 2 Section 3 of Layer 2 of layer 2 Level 4 Interconnects Level 6 Level 5 interconnects Thermal Interconnect Interface Material Signal TSVs Layer 1 Thermal TSVs Hotspot Location 2 layer 3D stacked Tree-based FPGA chip : Logic Units are placed in Layer 1 and Programmable Interconnects placed in active layer 2 12/23
3D Physical Design, TSV Management Where to Add and How Many? 1 TSVs are huge and cause coupling 2 TSV count is crucial (Design, Manufacturing, cost) How Many? 3 TSV location is crucial (Design, Device, Performance) Where to place? 4 TSVs require design-for-testing, Power and Clock Delivery 5 TSVs require design-for-manufacturability/reliability 6 TSV Area and Power consumption Optimization 7 TSV density and impact of TSVs to local vias. 13/23
TSV Vs Logic Cells TSV area Comparison with Logic Cells 9.5µm TSV Keep out Zone 8µm TSV Landing Pad Basic Logic Cell 1.05µm TSV Keep out Zone TSV Landing Pad 5µm TSV TSV Landing Pad TSV Keep out Zone 0 5µm 0 TSV Landing Pad 8µm TSV Keep out Zone 0 1.05µm 0 9.5µm 14/23
TSV & Programmable Interconnect Optimization Flow 3D Tree based FPGA Placement & Routing (Generalized Routing Solution) Initilize Break Point Level p(l_bp)=1 For each non Break Point level Select Random(l) Addjust Rent value p 3D Router based TSV count optimizer 3D Router based TSV count optimizer Addjust Rent value p Yes Routing Feasible? 3D stacked Tree based FPGA, Area & Power Estimation No Minimum TSV count Routing Feasible? No Yes Optimized TSV & Architecture Solution Timing Analysis Bitstream generation 15/23
TSV & Architecture Optimization Optimization Results Tree Levels=7 Arity=4, Arch=4x4x4x4x4x4x4 Architecture 3D Chip Optimized Int/TSV Optimized Levels Layer Rent p Gain(%) Area µm 2 Logic Blocks Layer 1 93635273 Switch Level 0 Layer 1 0.67 33(Int) 2412 Switch Level 1 Layer 1 0.54 46(Int) 10800 Switch Level 2 Layer 1 0.66 34(Int) 37496 Switch Level 3 Layer 1 0.59 41(TSV) 232128 BreakPoint Hori Horizontal Break Point Level 3 to 4 TSV Area=40192µm 2 Switch Level 4 Layer 2 0.67 33(Int) 6072770 Switch Level 5 Layer 2 0.66 34(Int) 45553499 Switch Level 6 Layer 2 0.65 35(Int) 42139683 Average 63.42 36.57 16/23
Rent=1 : Performance Analysis Tree Levels=7, Arity=4, Arch=4x4x4x4x4x4x4 Delay( 10 9 sec) Performance Gain(%) circuits 2D Tree 3D Tree 2D Tree 3D Mesh Gain MCNC Tree-based WithTSV 3D with TSV Vs 2D average(21) 96.06ns 28.76ns 68.7% 32% Critical Path Delay (ns) MCNC Benchmarks 21 MCNC 1 benchmark circuits Delay Improvement (%) MCNC Circuits 1 http ://er.cs.ucla.edu/benchmarks/ibm-place. 17/23
TSV Distribution and Placement 3D Tree-based FPGA, TSV Placement 1 Impact of TSV reduction on Performance 2 The count and location of TSVs have significant impact performance of 3D stacked chip 3 Tradeoff studies performed with Tree interconnect level partitioning across the dies in the 3D stack. 4 Simulations used regular and non-regular TSV placement. 18/23
Speed Degradation Tree Levels=7, Arity=4, Arch=4x4x4x4x4x4x4 TSV Reduction(%) Speed Degradation(%) MCNC(21) Tree-based Mesh-based Tree-based Mesh-based average 40.1 30 4.7 6.8 Speed degradation (%) MCNC Benchmark Circuits 3D Mesh based FPGA with 30% TSV reduction 3D Tree based FPGA with 40.1% TSV reduction 19/23
Static Power Consumption 3D Tree level Power Optimization Static Power (mw) 1400 1200 1000 800 600 400 200 Power estimation with rent=1 and rent=p Break Point (TSV Interconnect) Power with Rent=1 Power with Rent=p 0 0 1 2 3 4 5 6 L0 L1 L2 L3 L4 L5 L6 Interconnect Levels 1 37% reduction is programmable interconnect network 2 28% reduction is total power consumption. 20/23
3D FPGA Statistics 3D Tree-based FPGA Vs 3D Mesh-based FPGA 1 TSV Count reduced by 40% 2 Programmable Interconnect area reduced by 37%. 3 Path delay (performance) improved by 53%. 4 Programmable interconnect power reduced by 28%. 21/23
Presentation Summary 1 Developed a software supported design and optimization flow for 3D Tree-based FPGA 2 Physical design challenges of Tree-based programmable interconnect networks identified 3 A horizontal partitioning methodology for Tree-based programmable interconnect network to enable 3D integration. 4 3D integration enables improvements in performance, power conception and area of 3D stacked Tree-based FPGA. 5 An architecture and TSV count optimization flow introduced 6 3D Tree-based FPGA demonstrator 22/23
CoolChip :3D Tree-based FPGA Vinod Pangracious <vinod.pangracious@etu.upmc.fr>