Physical-Aware High Level Synthesis Congestion resolution for the realization of high-density and low-power

Similar documents
Unit 2: High-Level Synthesis

Low-Power Technology for Image-Processing LSIs

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes

High-Level Synthesis (HLS)

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

ESL design with the Agility Compiler for SystemC

FPGA Based Digital Design Using Verilog HDL

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

Programmable Logic Devices HDL-Based Design Flows CMPE 415

ASIC, Customer-Owned Tooling, and Processor Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Linking Layout to Logic Synthesis: A Unification-Based Approach

COE 561 Digital System Design & Synthesis Introduction

ASIC world. Start Specification Design Verification Layout Validation Finish

Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs.

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

System Level Design with IBM PowerPC Models

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

Comprehensive Place-and-Route Platform Olympus-SoC

Eliminating Routing Congestion Issues with Logic Synthesis

C-Based SoC Design Flow and EDA Tools: An ASIC and System Vendor Perspective

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

An introduction to CoCentric

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

UCLA 3D research started in 2002 under DARPA with CFDRC

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

HIGH-LEVEL SYNTHESIS

EE582 Physical Design Automation of VLSI Circuits and Systems

Early Models in Silicon with SystemC synthesis

A 65 nm Design Tape-Out in 6 Weeks

Concurrent, OA-based Mixed-signal Implementation

Three DIMENSIONAL-CHIPS

Lattice Semiconductor Design Floorplanning

HES-7 ASIC Prototyping

23. Digital Baseband Design

High-Level Synthesis Creating Custom Circuits from High-Level Code

Transaction level modeling of SoC with SystemC 2.0

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

System-on-Chip Design for Wireless Communications

EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design

Mentor Graphics Solutions Enable Fast, Efficient Designs for Altera s FPGAs. Fall 2004

Lecture 20: High-level Synthesis (1)

Design Tools for 100,000 Gate Programmable Logic Devices

SYNTHESIS FOR ADVANCED NODES

Platform for System LSI Development

Design Methodologies. Kai Huang

Ten Reasons to Optimize a Processor

UNIFIED HARDWARE/SOFTWARE CO-VERIFICATION FRAMEWORK FOR LARGE SCALE SYSTEMS

Hardware Modelling. Design Flow Overview. ECS Group, TU Wien

An overview of standard cell based digital VLSI design

Power Aware Architecture Design for Multicore SoCs

Next-generation Power Aware CDC Verification What have we learned?

Hardware in the Loop Functional Verification Methodology

CAD Technology of the SX-9

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Embedded SRAM Technology for High-End Processors

Performance Verification for ESL Design Methodology from AADL Models

Leveraging Formal Verification Throughout the Entire Design Cycle

Verification Futures The next three years. February 2015 Nick Heaton, Distinguished Engineer

A New Electronic System Level Methodology for Complex Chip Designs

Logic and Physical Synthesis Methodology for High Performance VLIW/SIMD DSP Core

RISECREEK: From RISC-V Spec to 22FFL Silicon

Inspirium HMI-Studio: Authoring System for Creating HMI Scenarios

HVSoCs: A Framework for Rapid Prototyping of 3-D Hybrid Virtual System-on-Chips

100M Gate Designs in FPGAs

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

EE595. Part VIII Overall Concept on VHDL. EE 595 EDA / ASIC Design Lab

Choosing an Intellectual Property Core

Reducing the cost of FPGA/ASIC Verification with MATLAB and Simulink

ZeBu : A Unified Verification Approach for Hardware Designers and Embedded Software Developers

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Hardware Design and Simulation for Verification

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Optimization of Behavioral IPs in Multi-Processor System-on- Chips

FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints

Appendix SystemC Product Briefs. All product claims contained within are provided by the respective supplying company.

TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES

EE382V-ICS: System-on-a-Chip (SoC) Design

Brief Introduction of Cell-based Design. Ching-Da Chan CIC/DSD

Hardware-Software Codesign. 1. Introduction

On GPU Bus Power Reduction with 3D IC Technologies

Advanced Synthesis Techniques

EE434 ASIC & Digital Systems Testing

Beyond Soft IP Quality to Predictable Soft IP Reuse TSMC 2013 Open Innovation Platform Presented at Ecosystem Forum, 2013

Laboratory 6. - Using Encounter for Automatic Place and Route. By Mulong Li, 2013

System Synthesis of Digital Systems

Design Methodologies

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

VHDL simulation and synthesis

Programmable Logic Devices II

Introduction to ASICs. ni logic Pvt. Ltd., Pune

Park Sung Chul. AE MentorGraphics Korea

Transcription:

CDN Live Japan on July 18, 2014 Physical-Aware High Level Synthesis Congestion resolution for the realization of high-density and low-power Masato Tatsuoka SoC Front-End Design Dept. SoC Design Center System LSI Company Fujitsu Semiconductor Limited

Contents Fujitsu Semiconductor s Custom SoC (ASIC) Solutions Background Current Challenges of High-level Design Flow (such as Routing Congestion) Physical-aware High-level Design Flow Results of Detection and Reduction of the Routing Congestion Summary 1

Fujitsu Semiconductor s Custom SoC (ASIC) Solutions 2

Introduction Fujitsu Semiconductor offers "Custom SoC Solutions" for customer's SoC development This allows the SoC development of high performance, high quality and low power consumption in a short term Custom SoC Solutions Offer Development Style: - Full custom - Platform base 3

Custom SoC Solutions : Platform-base Development Shorter development time by customizing from the base platform Shortened development time and reduced re-spin risk by evaluation and verification in advance 4

Design Service Cedar TM for SoC Development Cedar-SPEC Cedar-Specification Service to realize better quality of LSI specification and extraction of validation items by development process using UML improved quality of specification document extraction of verification items Cedar-ESL Cedar-Electronic System Level LSI verification service by development process with the combination of ESL and Emulation Architecture design by performance evaluation model Shorter time for S/W development by high-speed model Cedar-PROT Cedar-Prototyping Proto-type board development service which enables Hardware performance verification Hardware performance check System verification including S/W Cedar-EMU Cedar-Emulation Emulation verification service to shorten verification time to 1/1000 Performance verification and dissipation power measurement System verification including S/W Cedar-HLS Cedar-High Level Synthesis High level synthesis service Optimization of high level description ASIC Hand-Off for high level description + Σ Cedar-SIM Cedar-Simulation Verification service of simulation base that uses random technique and assertion technique Interface verification by inserting the assertion Encompassing verification by random method Function verification by Cedar-SPEC verification items Cedar-SPEC Cedar-ESL Cedar-PROT System design Architecture design System design Block design Specification Draft software design Architecture Exploration S/W advanced development Performance 性能 Verification Assets (reuse/inherit) Analysis 評価 high level synthesis System Verification Block verification Product verification Co-verification Block implementation ES development Prototype board development System building up Cedar : C-based Effective Design-Flow Apply to Real Design Cedar-HLS Cedar-SIM Cedar-EMU 5

Background 6

Motivation ( Background ) Routing congestion issue becomes remarkable with the recent process technology scaling Routing congestion tends to occur in advanced process node PPA (Performance / Power / Area) is degraded when routing congestion occurs Establishment of the synthesis tool and design flow which takes routing congestion into account is important High-level synthesis (HLS) technology becomes widely used Application range expands from a single module to sub-system, from several hundred thousand gates to several tens of millions gates The QoR is not guaranteed, since the algorithm level description doesn't care the generated circuit Routing congestion tends to happen in HLS compared with the RTL design High-level synthesis technology that takes into account the routing congestion has not been established yet Establishment of the high-level design flow that takes into account the routing congestion is desired We tried the high-level design flow that takes into account the routing congestion in cooperation with Cadence We applied it into the actual design 7

Causes of the Routing Congestion 1. Routing congestion caused from the floor plan For instance, improper placement of memory block causes routing congestion This could be eliminated by adjusting the memory block placement under the discussion with the logic design team However, it is difficult to identify this issue until the actual layout is executed 2. Routing congestion caused from the logical structures For instance, the logic design such as big data selector of the size causes the routing congestion It is possible to identify this issue even without the actual floorplan information This issue should be solved during the logic design phase Multiplexer(MUX)/Demultiplexer(DMUX) 8

Current Challenges of High-level Design Flow 9

Conventional High-level Design Flow (1) Equivalence Algorithm development C/C++ Code rewrite C/C++ SystemC (1) Algorithm C (2) Synthesizable C Since front-end and backend design are completely separately performed, the routing congestion is not detected until the back-end process started (2) Equivalence High-Level Synthesis HLS Tech Lib RTL (3) RTL from High Level Synthesis RTL Style Check RTL Rule Set Front-end Back-end (3) Equivalence and Performance verification RTL-Sim/(FPGA) Logic synthesis Netlist Congestion Place and Route Logic Synth. Tech Lib (4) Netlist from logic synthesis Quality degradation 10

Benefits of High-Level Synthesis (HLS) SystemC and C++ Behavioral Models The high abstraction description of SystemC/C++ enables the model similar to the algorithm Micro-Architecture Design Without changing the source code, it can select hardware implementation options by commands and constraints Schedule & Allocation Realization of excellent PPA (Performance /Power/Area) by automatic- insertion of latency, generation of state machine, and resource/register sharing. RTL Compared to conventional RTL, HLS shorten the time of logic design of hardware very much, but... 11

Challenges in HLS and Routing Congestion QoR is strongly dependent on the coding style HLS tends to generate route congested circuit than RTL design Especially for the beginner of HLS (even the expert of RTL design) Architecture selection is also a cause of the routing congestion If the circuit image is unclear, the problem becomes more serious Automatic architecture selection of the tool also promotes the complexity of the problem Aggressive resource & register sharing may also cause routing congestion In order to reduce the area, the designer tends to seek maximize the resource sharing However, the current design flow does not address the routing congestion 12

A Simple Case sc_in<int> in; sc_in<int> in2[16]; int array [16]; int a, b, c; void init() { a = 0; b = 1; c = in.read(); LOOP: for (i = 0; i < 16; i++) { array[i] = in2[i].read(); } } SystemC code CtoS command: Directed to break the loop. So, the input values are substituted for the array element by taking 16 cycles. Routing congestion occurs in this example from an actual design Well, what is the problem? 13

Index Access of Array 32bits sc_in<int> in; sc_in<int> in2[16]; int array [16]; int a, b, c; 32bits Substituted for 16 elements of the array[i] void init() { a = 0; b = 1; c = in.read(); LOOP: for (i = 0; i < 16; i++) { array[i] = in2[i].read(); } (1) (2) } 4bits (1) 16x32 DEMUX generated 32 bits 32 bits 16 Inputs Large DEMUX / MUX generated for accessing the array HLS tends to generate route congested circuit than RTL design 14 4 bits (2) 16x32 MUX generated

Physical-aware High-level Design Flow (1) Equivalence (2) Equivalence Algorithm development C/C++ Code rewrite C/C++ SystemC High Level Synthesis (1) Algorithm C (2) Synthesizable C Introduce the routing congestion detection and the improvement phase into the front end design, and eliminate the routing congestion caused from the logic design in early stage. HLS Tech Lib RTL (3) RTL from High Level Synthesis Improvement RTL Style Check RTL Rule Set RTL-Sim/(FPGA) Front end Back end (3) Equivalence & Performance verification Logic synthesis Routing congestion detection Netlist Place and Route Logic Synth. Tech Lib (4) Netlist from logic synthesis 15 Physical Tech Lib Quality improvement

Results of Detection and Reduction of the Routing Congestion Reduced both Routing Congestion and Area 16

Improvement case 1: detection of routing congestion Parameters for synthesis Parameters Value Utilization 70% CtoS Clock Freq. 200MHz Ratio 1.0 (square) RCP Congestion Map Item Execution result Value Routing congestion H: 8.55% V: 4.28% Long Path violations (50 stages) 4,786 Routing congestion MUX number 8,620 Total Length (um) 15,816,331 RC Area (CA) 42,697,674 Too Large! Circuit quality isn t good Routing congestion in the vertical direction: V Routing congestion in the horizontal direction: H 17

Identify the Location of Congestion Congestion MUX Reports 4.'memwrite_STR_A_ln102 : : : 6.'memwrite_STR_B_ln103' : : : 8.'memwrite_STR_A_ln104' : : : Identify the Location switch(idx){ case 0: start = 0; end = 1; break; case 1: start = 1; end = 5; break; } sc_uint<5> i=start; BREAK_LOOP: do{ 102: STR.A[i] =0; 103: STR.B[i] =0; 104: STR.C[i] =0; i++; }while(i<end); Loop Break 18

Code Modification for Congestion Reduction switch(idx){ case 0: start = 0; end = 1; break; case 1: start = 1; end = 5; break; } sc_uint<5> i=start; BREAK_LOOP: do{ STR.A[i] =0; STR.B[i] =0; STR.C[i] =0; i++; }while(i<end); Loop Break 32 4 Before Remove Demux for array accesses 32 16 switch(idx){ case 0: start = 0; end = 1; UNROLL1: for (i = start; i <= end_pu; i++) { STR.A[i] =0; STR.B[i] =0; STR.C[i] =0; } break; case 1: start = 1; Loop Unroll end = 5; UNROLL1: for (i = start; i <= end; i++) { STR.A[i] =0; STR.B[i] =0; STR.C[i] =0; } break; After Loop Unroll 32 16 19

Congestion Improvement Results Improvement 1: Only 3 places corrected with low effort (Quick correction) Improvement 2: Several places are corrected with high effort (Deep knowledge of the circuit & correction strategy required) Original After improvement 1 After improvement 2 Item Original After improvement 1 After improvement 2 Routing Congestion H: 8.55% V: 4.28% H: 1.79% V: 2.95% H:017% V:0.81% Long Path Violations 4,786 76 472 Number of MUXs 8,620 5,158 (-40% ) 4,982 (-42% ) Total Length 15,816,331 15,415,309 (-2% ) 13,594,980 (-14% ) RC Area (CA) 42,697,674 29,955,645 (-29% ) 33,149,970 (-22% ) 20

Summary Routing congestion causes the quality loss of the design Routing congestion wasn't taken into account in the conventional High-level Design Flow We proposed a new High-level Design Flow which takes routing congestion into account Introduced the detection and improvement steps of the routing congestion into the front end design phase We applied this technology into the actual SoC design Routing congestions detected as expected Confirmed the correlation of quality degradation in pre- and post-layout We'll continue the improvement of the routing congestion, and confirm the result of the post-layout 21

Expectations for the tool Productization of Physical-aware High-level Synthesis tool as soon as possible More comprehensible report for identifying the location of the routing congestion Introduce the command for the improvement strategy of the routing congestion Registers sharing, resources sharing, and scheduling should be performed, with taking the routing congestion into account as well Applicable without any change in the SystemC code 22

23 Copyright 2010 FUJITSU LIMITED