Designing 3D Tree-based FPGA TSV Count Minimization. V. Pangracious, Z. Marrakchi, H. Mehrez UPMC Sorbonne University Paris VI, France

Similar documents
Designing a 3D Tree-based FPGA: Optimization of Butterfly Programmable Interconnect Topology Using 3D Technology

Performances improvement of FPGA using novel multilevel hierarchical interconnection structure

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

A Design Tradeoff Study with Monolithic 3D Integration

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

On GPU Bus Power Reduction with 3D IC Technologies

Original scientific paper Journal of Microelectronics, Electronic Components and Materials Vol. 46, No. 1(2016), 3 12

Research Article FPGA Interconnect Topologies Exploration

Application-Specific Mesh-based Heterogeneous FPGA Architectures

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

Three DIMENSIONAL-CHIPS

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Stacked Silicon Interconnect Technology (SSIT)

How Much Logic Should Go in an FPGA Logic Block?

INTRODUCTION TO FPGA ARCHITECTURE

Thermal Analysis on Face-to-Face(F2F)-bonded 3D ICs

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Research Challenges for FPGAs

Monolithic 3D IC Design for Deep Neural Networks

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

An FPGA Design And Implementation Framework Combined With Commercial VLSI CADs

UCLA 3D research started in 2002 under DARPA with CFDRC

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

New Successes for Parameterized Run-time Reconfiguration

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

An Overview of Standard Cell Based Digital VLSI Design

CELL-BASED design technology has dominated

Mitigation of SCU and MCU effects in SRAM-based FPGAs: placement and routing solutions

THERMAL EXPLORATION AND SIGN-OFF ANALYSIS FOR ADVANCED 3D INTEGRATION

Chapter 2 Three-Dimensional Integration: A More Than Moore Technology

Programmable Logic Devices

Exploring Logic Block Granularity for Regular Fabrics

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

More Course Information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

Spiral 2-8. Cell Layout

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

On the Decreasing Significance of Large Standard Cells in Technology Mapping

CS310 Embedded Computer Systems. Maeng

On Supporting Adaptive Fault Tolerant at Run-Time with Virtual FPGAs

What is Xilinx Design Language?

SPEED AND AREA TRADE-OFFS IN CLUSTER-BASED FPGA ARCHITECTURES

Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits

An Interconnect-Centric Design Flow for Nanometer Technologies

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Thermal Sign-Off Analysis for Advanced 3D IC Integration

Stacked IC Analysis Modeling for Power Noise Impact

An overview of standard cell based digital VLSI design

On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Place and Route for FPGAs

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Basic Idea. The routing problem is typically solved using a twostep

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

FPGA: What? Why? Marco D. Santambrogio

Design Methodologies and Tools. Full-Custom Design

CprE 583 Reconfigurable Computing

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs

Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs

A Time-Multiplexed FPGA

FPGA Power Management and Modeling Techniques

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

ESE535: Electronic Design Automation. Today. Question. Question. Intuition. Gate Array Evaluation Model

FABRICATION TECHNOLOGIES

ASIC Physical Design Top-Level Chip Layout

Designing Heterogeneous FPGAs with Multiple SBs *

SSO Noise And Conducted EMI: Modeling, Analysis, And Design Solutions

CAD Algorithms. Placement and Floorplanning

DSENT A Tool Connecting Emerging Photonics with Electronics for Opto- Electronic Networks-on-Chip Modeling Chen Sun

Package level Interconnect Options

3D Technologies For Low Power Integrated Circuits

Architecture Evaluation for

Variation Aware Routing for Three-Dimensional FPGAs

Imaging Solutions by Mercury Computer Systems

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

AMchip architecture & design

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Embedded SRAM Technology for High-End Processors

Academic Clustering and Placement Tools for Modern Field-Programmable Gate Array Architectures

Co-optimization of TSV assignment and micro-channel placement for 3D-ICs

EE219A Spring 2008 Special Topics in Circuits and Signal Processing. Lecture 9. FPGA Architecture. Ranier Yap, Mohamed Ali.

Design Methodologies. Full-Custom Design

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

A Framework for Systematic Evaluation and Exploration of Design Rules

Study of GALS based FPGA Architecture Using CAD Tool

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Transcription:

Designing 3D Tree-based FPGA TSV Count Minimization V. Pangracious, Z. Marrakchi, H. Mehrez UPMC Sorbonne University Paris VI, France 13 avril 2013

Presentation Outlook Introduction : 3D Tree-based FPGA Architecture 1 Mesh-based and Tree-based FPGA architecture 2 Tree-based FPGA interconnect organization 3 3D Interconnect (TSV), Where to Add and How many? 3D Tree-based FPGA Design and Optimization 1 3D Design and TSV Count Optimization Methodology 2 3D Floorpaln development,timing Analysis 3D Tree-based FPGA Experimental Analysis 1 TSV Count Reduction and Performance Analysis 2 3D Tree-based FPGA Architecture Optimization 3 Interconnect Power Estimation 2/23

Industrial FPGA Architecture Mesh-based FPGA : Industrial Architecture Wire Segments S C S C S C S Configurable Logic Block (CLB) C CLB C CLB C CLB S C S C S C S C CLB C CLB C CLB C S C S C S C S C 3 2 1 0 0 1 2 3 3 2 1 0 C CLB C CLB C CLB C S C S C S C S 0 1 2 3 Switch Block Detail Connection Block Most common Academic and Industrial Architecture 3/23

2D FPGA Statistics 2-Dimensional Mesh-based FPGA Statistics 1 Programmable Interconnects occupy 90% of the FPGA Area. 2 Contributes roughly 80% of the total path delay. 3 Contributes more than 60% of the total dynamic power consumption. 4 As a result, FPGA performance is significantly worse in terms of logic density, delay and power consumption compared to cell based ASICs. 5 Research studies have estimated FPGAs to be more than 10 times less efficient in logic density, 3 times larger in delay and 3 times higher in power consumption compared ASICs 4/23

FPGA Architecture A Novel High Density Tree-based FPGA Architecture To Level 2 To Level 2 To Level 2 To Level 2 Cluster Level 1 Cluster Level 1 Cluster Level 1 DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB UMSB UMSB UMSB UMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB DMSB OUT UMSB Cluster Level 0 UMSB UMSB UMSB IN Pads LB LB LB LB LB LB LB LB LB LB LB LB LB LB LB LB OUT Pads An integrated upward and downward unidirectional programmable interconnect network using Butterfly-fat-tree network topology 5/23

Tree-Based FPGA Interconnect Network Organization Upward Interconnection Upward Mini Switch Blocks Full crossbar switch nbumsb(l) = N in (l 1) Downward Interconnection Downward Mini Switch Blocks Full crossbar switch log k (N) N switch (down) = N (k p c in + kc out ) log k (N) N switch (up) = N k c out nbdmsb(l) = N in (l 1) l=1 l=1 k (p 1)(l 1) k (p 1)(l 1) N is the total number of logic blocks, c in and c out are the number of inputs and outputs of logic blocks, k is the arity, and p and l are the Rent s parameter and level. 6/23

Tree-based Interconnect Length Wire length estimation of 2D Tree-based FPGA layout 2 Dimesional Layout Analysis Wire length (µm) Tree interconnect levels Wire length increases exponentially as Tree grows to higher levels 7/23

3D Stacked Tree-based FPGA Horizontal Partitioning of Tree Interconnect Level 2 S Level 2 Horizontal Tree Break Point Horizontal Tree Break Point Switch blocks Level 1 S Level 1 Switch blocks S Switch blocks S Level 1 S Level 1 Level 0 Level 0 Level 0 Level 0 Level 0 s s s s s s s s s s s s s s s s Logic Blocks Logic Blocks Logic Blocks Logic Blocks The network partitioning and location of the break-point is decided based on interconnect delay optimization. 8/23

3-Dimensional Layout Analysis 3D Layout Delay estimation Downward Network Feedback Network Upward Network DMSB Level 1 DMSB Level 1 MX1 MX2 MX3 MX4 UMSB MX1 MX2 MX3 MX4 UMSB IO Nets Downward Network Level 0 DMSB DMSB DMSB Level 0 Level 0 MX1 MX2 MX3 MX4 MX1 MX2 MX3 MX4 MX1 MX2 MX3 MX4 Output Pads Downward Network IO Nets Upward Network UMSB Input Pads LUT LUT LUT LUT FF FF FF FF LB LB LB LB Upward Network 1 Mentor s SPICE accurate circuit simulator Eldo. 2 ST Micro s 130nm technology transistor models 9/23

3-Dimensional Layout Analysis 3D Layout Delay measurement setup Horizontal Break Point Delay Results Measured Delay (ns) 3 2.5 2 1.5 1 0.5 0 14 12 10 8 6 4 2 0 3D Delay 3D Delay Break Point Lelve 6 TSV Level 4 Level2 Level 0 &1 Level 5 10 100 1000 10000 100000 L0 2D Delay Level 0 1 & 2 L1 Lelve 6 Level 4 Level3 Level 5 L2 L3 L4 Number of LUTs 2D Delay 10 100 1000 10000 100000 L5 L6 Re-organized active layer 2 (TSV placement) to optimize delay 10/23

3-Dimensional Layout Design 3D compatible Tree-based FPGA Floorplan arrangement LBs & local interconnect network tree level 0,1,2,3 Tree Level 6 interconnect Section 2 of layer 2 Level 5 to 6 local Interconnection TSV TSV Thermal interface Tree Levels 0 to 3 Floorplan Layer 1 TSV Tree Level 5 interconnect Section 3 of layer 2 Interconnect Level 5 TSV Break point Level Section 1 of layer 2 Tree level 4 interconnect TSV Thermal interface Level 4 to 5 local Interconnection 11/23

3-Dimensional Layout Design 3D compatible Layout Design Section 2 Layer 2 Section 1 of layer 2 Section 3 of Layer 2 of layer 2 Level 4 Interconnects Level 6 Level 5 interconnects Thermal Interconnect Interface Material Signal TSVs Layer 1 Thermal TSVs Hotspot Location 2 layer 3D stacked Tree-based FPGA chip : Logic Units are placed in Layer 1 and Programmable Interconnects placed in active layer 2 12/23

3D Physical Design, TSV Management Where to Add and How Many? 1 TSVs are huge and cause coupling 2 TSV count is crucial (Design, Manufacturing, cost) How Many? 3 TSV location is crucial (Design, Device, Performance) Where to place? 4 TSVs require design-for-testing, Power and Clock Delivery 5 TSVs require design-for-manufacturability/reliability 6 TSV Area and Power consumption Optimization 7 TSV density and impact of TSVs to local vias. 13/23

TSV Vs Logic Cells TSV area Comparison with Logic Cells 9.5µm TSV Keep out Zone 8µm TSV Landing Pad Basic Logic Cell 1.05µm TSV Keep out Zone TSV Landing Pad 5µm TSV TSV Landing Pad TSV Keep out Zone 0 5µm 0 TSV Landing Pad 8µm TSV Keep out Zone 0 1.05µm 0 9.5µm 14/23

TSV & Programmable Interconnect Optimization Flow 3D Tree based FPGA Placement & Routing (Generalized Routing Solution) Initilize Break Point Level p(l_bp)=1 For each non Break Point level Select Random(l) Addjust Rent value p 3D Router based TSV count optimizer 3D Router based TSV count optimizer Addjust Rent value p Yes Routing Feasible? 3D stacked Tree based FPGA, Area & Power Estimation No Minimum TSV count Routing Feasible? No Yes Optimized TSV & Architecture Solution Timing Analysis Bitstream generation 15/23

TSV & Architecture Optimization Optimization Results Tree Levels=7 Arity=4, Arch=4x4x4x4x4x4x4 Architecture 3D Chip Optimized Int/TSV Optimized Levels Layer Rent p Gain(%) Area µm 2 Logic Blocks Layer 1 93635273 Switch Level 0 Layer 1 0.67 33(Int) 2412 Switch Level 1 Layer 1 0.54 46(Int) 10800 Switch Level 2 Layer 1 0.66 34(Int) 37496 Switch Level 3 Layer 1 0.59 41(TSV) 232128 BreakPoint Hori Horizontal Break Point Level 3 to 4 TSV Area=40192µm 2 Switch Level 4 Layer 2 0.67 33(Int) 6072770 Switch Level 5 Layer 2 0.66 34(Int) 45553499 Switch Level 6 Layer 2 0.65 35(Int) 42139683 Average 63.42 36.57 16/23

Rent=1 : Performance Analysis Tree Levels=7, Arity=4, Arch=4x4x4x4x4x4x4 Delay( 10 9 sec) Performance Gain(%) circuits 2D Tree 3D Tree 2D Tree 3D Mesh Gain MCNC Tree-based WithTSV 3D with TSV Vs 2D average(21) 96.06ns 28.76ns 68.7% 32% Critical Path Delay (ns) MCNC Benchmarks 21 MCNC 1 benchmark circuits Delay Improvement (%) MCNC Circuits 1 http ://er.cs.ucla.edu/benchmarks/ibm-place. 17/23

TSV Distribution and Placement 3D Tree-based FPGA, TSV Placement 1 Impact of TSV reduction on Performance 2 The count and location of TSVs have significant impact performance of 3D stacked chip 3 Tradeoff studies performed with Tree interconnect level partitioning across the dies in the 3D stack. 4 Simulations used regular and non-regular TSV placement. 18/23

Speed Degradation Tree Levels=7, Arity=4, Arch=4x4x4x4x4x4x4 TSV Reduction(%) Speed Degradation(%) MCNC(21) Tree-based Mesh-based Tree-based Mesh-based average 40.1 30 4.7 6.8 Speed degradation (%) MCNC Benchmark Circuits 3D Mesh based FPGA with 30% TSV reduction 3D Tree based FPGA with 40.1% TSV reduction 19/23

Static Power Consumption 3D Tree level Power Optimization Static Power (mw) 1400 1200 1000 800 600 400 200 Power estimation with rent=1 and rent=p Break Point (TSV Interconnect) Power with Rent=1 Power with Rent=p 0 0 1 2 3 4 5 6 L0 L1 L2 L3 L4 L5 L6 Interconnect Levels 1 37% reduction is programmable interconnect network 2 28% reduction is total power consumption. 20/23

3D FPGA Statistics 3D Tree-based FPGA Vs 3D Mesh-based FPGA 1 TSV Count reduced by 40% 2 Programmable Interconnect area reduced by 37%. 3 Path delay (performance) improved by 53%. 4 Programmable interconnect power reduced by 28%. 21/23

Presentation Summary 1 Developed a software supported design and optimization flow for 3D Tree-based FPGA 2 Physical design challenges of Tree-based programmable interconnect networks identified 3 A horizontal partitioning methodology for Tree-based programmable interconnect network to enable 3D integration. 4 3D integration enables improvements in performance, power conception and area of 3D stacked Tree-based FPGA. 5 An architecture and TSV count optimization flow introduced 6 3D Tree-based FPGA demonstrator 22/23

CoolChip :3D Tree-based FPGA Vinod Pangracious <vinod.pangracious@etu.upmc.fr>