Multicycle-Path Challenges in Multi-Synchronous Systems

Similar documents
Stratix vs. Virtex-II Pro FPGA Performance Analysis

Chronos Latency - Pole Position Performance

Towards Optimal Custom Instruction Processors

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

An overview of standard cell based digital VLSI design

Cognitive Radio Platform Research at WINLAB

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

High Level Synthesis Re-usable model of AMBA AXI4 communication protocol for HLS based design flow developed using SystemC Synthesis subset

EECS150 - Digital Design Lecture 17 Memory 2

An Overview of Standard Cell Based Digital VLSI Design

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Evolution of CAD Tools & Verilog HDL Definition

The Design of MCU's Communication Interface

NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION SIMPLIFIES THE DISTRIBUTION OF I/O DATA THROUGHOUT THE CHIP, AND IS ALWAYS

Lecture 41: Introduction to Reconfigurable Computing

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

A VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS

Clocked and Asynchronous FIFO Characterization and Comparison

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

The Benefits of FPGA-Enabled Instruments in RF and Communications Test. Johan Olsson National Instruments Sweden AB

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Beyond Soft IP Quality to Predictable Soft IP Reuse TSMC 2013 Open Innovation Platform Presented at Ecosystem Forum, 2013

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

FPGA-Based Rapid Prototyping of Digital Signal Processing Systems

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Ten Reasons to Optimize a Processor

EITF35: Introduction to Structured VLSI Design

SOCS BASED OPENRISC AND MICROBLAZE SOFT PROCESSORS COMPARISON STUDY CASES: AUDIO IMPLEMENTATION AND NETWORK IMPLEMENTATION BASED SOCS

Custom Design Formal Equivalence Checking Based on Symbolic Simulation. Overview. Verification Scope. Create Verilog model. Behavioral Verilog

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Employing Multi-FPGA Debug Techniques

Ultra-Fast NoC Emulation on a Single FPGA

Digital Systems Design. System on a Programmable Chip

Fast Flexible FPGA-Tuned Networks-on-Chip

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Designing of DDR SDRAM Controller with User Interface

Workspace for '4-FPGA' Page 1 (row 1, column 1)

SPART - A Special Purpose Asynchronous Receiver/Transmitter

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Design of Embedded Hardware and Firmware

Stratix II vs. Virtex-4 Performance Comparison

Design of a System-on-Chip Switched Network and its Design Support Λ

An Introduction to Programmable Logic

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

ECE 459/559 Secure & Trustworthy Computer Hardware Design

Timing Constraints Editor User Guide

Benefits of Embedded RAM in FLEX 10K Devices

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Design of Asynchronous Interconnect Network for SoC

1 Beta extension of the 1394a PHY Link Interface Specification

INTRODUCTION TO FPGA ARCHITECTURE

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Networks-on-Chip Router: Configuration and Implementation

ECE 448 Lecture 5. FPGA Devices

A Time-Multiplexed FPGA

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

PowerPC on NetFPGA CSE 237B. Erik Rubow

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

Natalie Enright Jerger, Jason Anderson, University of Toronto November 5, 2010

Midterm Exam. Solutions

Am2901 Completion and Integration with Am9080a

Accelerating CDC Verification Closure on Gate-Level Designs

Adapter Modules for FlexRIO

DoCD IP Core. DCD on Chip Debug System v. 6.02

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Lecture 11 Logic Synthesis, Part 2

Line-rate packet processing in hardware: the evolution towards 400 Gbit/s

A Method To Derive Application-Specific Embedded Processing Cores Olivier Hébert 1, Ivan C. Kraljic 2, Yvon Savaria 1,2 1

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

Kyoung Hwan Lim and Taewhan Kim Seoul National University

FPGA Polyphase Filter Bank Study & Implementation

KiloCore: A 32 nm 1000-Processor Array

ECE 551 System on Chip Design

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Prof. Steven Nowick. Chair, Computer Engineering Program

Block Diagram. mast_sel. mast_inst. mast_data. mast_val mast_rdy. clk. slv_sel. slv_inst. slv_data. slv_val slv_rdy. rfifo_depth_log2.

White Paper Low-Cost FPGA Solution for PCI Express Implementation

System Verification of Hardware Optimization Based on Edge Detection

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Parallel graph traversal for FPGA

David Merodio Codinachs Filomena Decuzzi (Until Dec 2010), Javier Galindo Guarch TEC EDM

LX4180. LMI: Local Memory Interface CI: Coprocessor Interface CEI: Custom Engine Interface LBC: Lexra Bus Controller

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

FPGAs: FAST TRACK TO DSP

Ultra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies

Place Your Logo Here. K. Charles Janac

High-Performance Memory Interfaces Made Easy

Extreme Low Latency 10G Ethernet IP Solution Product Brief (HTK-ELL10G-ETH-FPGA)

Stacked Silicon Interconnect Technology (SSIT)

Mixed Signal Verification of an FPGA-Embedded DDR3 SDRAM Memory Controller using ADMS

Transcription:

Multicycle-Path Challenges in Multi-Synchronous Systems G. Engel 1, J. Ziebold 1, J. Cox 2, T. Chaney 2, M. Burke 2, and Mike Gulotta 3 1 Department of Electrical and Computer Engineering, IC Design Research Laboratory Southern Illinois University Edwardsville, Il, USA, 62026-1801 2 Blendics, Inc. 10176 Corporate Square Drive St. Louis, MO 63132-2924 3 Independent FPGA Consultant ICCAD 2012 (MSCAS Workshop) San Jose, CA November 8, 2012 1

Presentation Overview Background Description of DANI DANI for FPGA applications DANI for ASIC SoCs Summary 2

The Problem In a recent survey conducted by Real Intent at the 2012 DAC Conference, 315 attendees were asked the question: "How many clock domains do you expect it (i.e your next design) will have? Here were the responses: under 50 : ################################## (68%) 50-100 : ########## (20%) 100-500 : #### (8%) over 500 : ## (4%) 1 in 3 of all new designs are expected to possess more than 50 clock domains! http://www10.edacafe.com/blogs/realintent/2012/09/27/dac-survey-on-cdc-bugs-x-propagation-constraints/# 3

Our Solution The IC Design Research Laboratory at Southern Illinois University Edwardsville (SIUE) has been working closely with Blendics, Inc. over the past several years. This collaboration has produced an ANoC innovation called DANI. Delay-tolerant Asynchronous Network Interface 4

DANI 5

DANI Source 6

DANI Destination 7

DANI in FGPA Designs C2C Route Delay (ns) 18 16 14 12 10 8 6 4 2 0 28 nm parts 40 nm parts Floor-planned bus delay 65 nm parts Designs evaluated 45 nm parts A data bus route, belonging to a design supplied to us from a potential customer, was floor-planned in a Virtex LX155 Xilinx FGPA. The route was stretched to 10 ns in order to emulate the delays expected in larger and more modern chips. The bus was 256 bits wide. 8

Impact of DANI While the initial design using a single-cycle, synchronous interconnect was limited to 100 MHz, both the design employing DANI and a comparable single register pipeline version of the design operated at the specified 175 MHz rate. Moreover, the design employing DANI was capable of operating at 300 MHz while the pipelined version was able to operate at 300 MHz only if a second pipeline register were added. DANI latency was measured at 2 clock cycles while library asynchronous FIFOs (Xilinx and Altera) displayed latencies from 4 to 8 clock cycles. And, the design utilizing DANI required 30% less time to build (MAP, PAR) compared to a single register pipeline version of the design. 9

DANI in ASIC Designs Currently exploring the use of DANI in SoC designs. Attempting to implement DANI using only standard cells and currently available clocked-cad design tools/flows. Using Cadence s RTL Compiler and Silicon Encounter tools along with a generic 45 nm open-source standard cell library from Nangate. Using traditional SDCs augmented by a collection of Tcl scripts. 10

Max-skew Constraint 11

ASIC Test Designs Able to reliably control skew on extended data bus between a DANI source and DANI destination while meeting signal integrity (SI) constraints. Bus widths have ranged from 16 to 1024 bits. The DANI source and DANI destination modules operate in excess of 1 GHz. We have successfully controlled skew in a design containing multiple (i.e. 3) DANI sources and destinations. 12

ASIC Test Designs We are currently working on a design which will contain 100 copies of a wrapped processor. The wrapper contains a DANI destination and DANI source. The wrapped processors are connected in a ring and the datapath between processors is 16 bits wide. Each wrapped processor occupies 0.35 mm 2. 13

Summary An FPGA circuit with DANI met required timing at 175MHz, but a comparable design employing a single-cycle, synchronous route failed to implement because it could not meet timing. Bus and clock skew controlled in DANI version so as to make possible operation up to 300 MHz. DANI latency in the FPGA design was measured at 2 clock cycles while library asynchronous FIFOs displayed latencies from 4 to 8 clock cycles. The build time for the DANI-based circuit was 30% less than that of a comparable register-pipelined circuit. DANI on ASICs appears viable. Able to reliably control skew on data bus between a DANI source and DANI destination while meeting signal integrity (SI) constraints. Bus widths have ranged from 16 to 1024 bits. Additional ASIC experiments planned in near future. 14

Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No. 0924010. Additional support was provided by the National Innovation Fund, Omaha, NE. 15

THANK YOU! QUESTIONS? 16