High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification

Similar documents
A Deterministic Flow Combining Virtual Platforms, Emulation, and Hardware Prototypes

Assertions: Too good to be reserved for verification only.

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

FPGA for Software Engineers

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Source-Level Debugging for FPGA High-Level Synthesis

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES

Assertion Checker Synthesis for FPGA Emulation

SystemC Synthesis Standard: Which Topics for Next Round? Frederic Doucet Qualcomm Atheros, Inc

Outline Introduction System development Video capture Image processing Results Application Conclusion Bibliography

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Mentor Graphics Solutions Enable Fast, Efficient Designs for Altera s FPGAs. Fall 2004

Bibliography. Measuring Software Reuse, Jeffrey S. Poulin, Addison-Wesley, Practical Software Reuse, Donald J. Reifer, Wiley, 1997.

Cover TBD. intel Quartus prime Design software

FPGA design with National Instuments

Overview of ROCCC 2.0

ECC1 Core. Elliptic Curve Point Multiply and Verify Core. General Description. Key Features. Applications. Symbol

ON THE EFFECTIVENESS OF ASSERTION-BASED VERIFICATION

Evolution of CAD Tools & Verilog HDL Definition

Post processing techniques to accelerate assertion development Ajay Sharma

FPGA Design Flow 1. All About FPGA

Flexible Architecture Research Machine (FARM)

Cover TBD. intel Quartus prime Design software

Source-Level Debugging Framework Design for FPGA High-Level Synthesis

ALTERA FPGAs Architecture & Design

Choosing an Intellectual Property Core

AS part of my MPhil course Advanced Computer Design, I was required to choose or define a challenging hardware project

Synthesis of Synchronous Assertions with Guarded Atomic Actions

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

VHDL for Synthesis. Course Description. Course Duration. Goals

Esterel Studio Update

LegUp: Accelerating Memcached on Cloud FPGAs

Debug enhancements in assertion-checker generation

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Design Space Exploration Using Parameterized Cores

2.5G Reed-Solomon II MegaCore Function Reference Design

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

An End-to-End Tool Flow for FPGA-Accelerated Scientific Computing

Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware

outline Reliable State Machines MER Mission example

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Performance and Overhead in a Hybrid Reconfigurable Computer

Optimizing Impulse C Code for Performance

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

System Debugging Tools Overview

Ultra-Fast NoC Emulation on a Single FPGA

Employing Multi-FPGA Debug Techniques

Revolutioni W zi h Wn e hgn e n F a Mi i s liu lsir u e ro e Cri I ti s Ic N al o t V A e n ri n O fi p c ti a o ti n oo

High Speed SPI Slave Implementation in FPGA using Verilog HDL

LIQUID METAL Taming Heterogeneity

Advanced FPGA Design Methodologies with Xilinx Vivado

FPGAs: FAST TRACK TO DSP

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Early Models in Silicon with SystemC synthesis

Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays

Intel HLS Compiler: Fast Design, Coding, and Hardware

Co-synthesis and Accelerator based Embedded System Design

Energy scalability and the RESUME scalable video codec

A Framework to Improve IP Portability on Reconfigurable Computers

Hybrid Threading: A New Approach for Performance and Productivity

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Lab 1: Using the LegUp High-level Synthesis Framework

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Complex Signal Processing Verification under DO-254 Constraints by François Cerisier, AEDVICES Consulting

Design Once with Design Compiler FPGA

A Hardware / Software Co-Design System using Configurable Computing Technology

Park Sung Chul. AE MentorGraphics Korea

Verilog for High Performance

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Partial Reconfiguration User Guide

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

FPGA BASED ACCELERATION OF THE LINPACK BENCHMARK: A HIGH LEVEL CODE TRANSFORMATION APPROACH

Digital Systems Design. System on a Programmable Chip

FPGA Solutions: Modular Architecture for Peak Performance

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

AES1. Ultra-Compact Advanced Encryption Standard Core AES1. General Description. Base Core Features. Symbol. Applications

Introduction to Field Programmable Gate Arrays

Efficient Hardware Debugging using Parameterized FPGA Reconfiguration

VHDL Essentials Simulation & Synthesis

High Level Synthesis of Cryptographic Hardware. Jeremy Trimble ECE 646

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization

Qualification of Verification Environments Using Formal Techniques

Benchmarking of Cryptographic Algorithms in Hardware. Ekawat Homsirikamol & Kris Gaj George Mason University USA

INTRODUCTION TO FPGA ARCHITECTURE

DDR and DDR2 SDRAM Controller Compiler User Guide

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Verification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation

ECE 699: Lecture 12. Introduction to High-Level Synthesis

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Transcription:

High-Level Synthesis Techniques for In-Circuit Assertion-Based Verification John Curreri Ph.D. Candidate of ECE, University of Florida Dr. Greg Stitt Assistant Professor of ECE, University of Florida April 19, 2010 Name=speaker Dr. Alan D. George Professor of ECE, University of Florida

High-Level Synthesis Ease of programming No HDL coding required for application acceleration Abstraction of communication function Provides built-in support Methods to gain speedup Pipelining of loops High-performance library functions Mitigation of race conditions Signals Semaphores Communication Buffered streaming transfers DMA transfers e.g. Impulse-C tool flow 2

High-Level Synthesis Verification HLS simulation Executes source on CPU Does not provide accurate verification HDL simulation and analyzers Provide accurate verification Productivity lowered by the need to understand machine-generated HDL Verification framework needed Maintains high abstraction level and provides accurate verification Programmers are not required to understand HDL or HDL tools HLS Simulation HDL Simulation C Source HDL Logic Analyzer Framework Presentation Framework Instrumentation

ANSI-C Assertion Debugging Error checking Used to check if variables are in an acceptable range Example usage int num,i,x[10]; while(num==0) { num=x[i]; i++ assert(i<10); } Failure actions Failure information is printed to stderr assert_test.c:7: main: Assertion `i==0' failed. W:X: Y: Assertion ' Z' failed. Assertion Code assert(i == 0); Assertion Output assert_test.c:7: main: Assertion 'i == 0' failed. W = file name; X = line number; Y = function name; Z= expression Program terminated using abort() Assertion checking switch #define NDEBUG Disables assertion checking 4

Related Research Assertion languages and libraries VHDL assertion statements SystemVerilog Assertions (SVA) Open Verification Library (OVL) Property Specification Language (PSL) Commercial assertion tools Temento s DiaLite Academic debugging tools Camera s debugging environment Sea Cucumber synthesizing compiler Logic analyzers Xilinx s ChipScope Altera s SignalTap 5

Verification Framework Overview Assertion-based verification usage Document and check for conditions that should never occur during execution In-circuit verification process Open application files in GUI Single-click instrumentation Converts assertions to if statements Generates communication channels Creates software function to display errors and program abort if failure detected Use standard tool flow to compile/execute Assertion failure output during execution Seamlessly transfer assertions from simulation to runtime HLS Assert Assertion Code assert(address >= 0); Assertion Output memtest_hw.c:14: Assertion 'address >= 0' failed. CPU FPGA 6

Standard Assertion Assertion conversion FPGA side Assert statement changed to if statement False evaluation FPGA side Sends a message with a unique identifier Assertion notification CPU side Function to receive, decode, and display failed assertions ANSI-C output format 7

Assertion Optimizations Parallelization Assertion checking can slow down the application Move assertion checkers to a separate parallel process Communication can slow down pipelined assertions Move communication calls to a third process Clock cycles 1 2 3 4 Standard App. line 1 Check assertion Failure communication App. line 2 App. line 1 App. line 2 App. line 3 App. line 4 Optimized Check assertion Failure communication 8

State-machine Comparison assert((j <= 0 a[0] == i) && (b[0] == 2 i > 0)); Original Standard assertion Optimized assertion Boolean operators require many additional states Array data retrieval requires an extra state 9

Assertion Optimizations Resource replication Application and assertion are competing for data access Replicate data structure (e.g., duplicated block RAM that is dedicated for assertion read access) Standard Optimized Clock cycles 1 2 3 4 App. read a[0] App. read a[1] Application Assert read a[1] Communication App. read a[0] App. read a[1] Application Application Assert read a[1] Communication 10

Assertion Optimizations Resource sharing Minimize FPGA resources usage of assertions Reuse assertion checking and communication resources amongst all assertion calls Standard P P P P P P P P A A A A A A A A C C C C 1 1 1 1 32 32 32 32 32 32 32 4 C 32 32 P A C Optimized Application process Assertion checker Failure communication Impulse C wrapper Impulse C wrapper 11

In-Circuit Verification Case Study Assertion in Line 6 shows Impulse-C translation mistake Simulation 64-bit comparison of 4294967286 > 4294967296 evaluates to false Execution on target platform 5-bit comparison of 22 > 0 evaluates to true Assertion in Line 8 shows user translation mistake Impulse-C simulation requires C code for HDL function Behaviors of C code and HDL may not be the same Assertions can be used to check that behaviors match 1 co_uint64 c2, c1; 2 co_int32 address, array[20], out; 3 c2 = 4294967286; c1 = 4294967296; 4 if (c2 > c1) address = c2 - c1; 5 else address = 0; 6 assert(address >= 0); 7 out = user_hdl(address); 8 assert((30 > out) && (out > 20)); 9 array[address] = out; Impulse-C design, XD1000, Stratix-II (EP2S180) 12

Debugging Case Study assert(0); Used to trace execution To find when an application fails to complete (hangs) Positive indicator rather than negative indicator NABORT Stops application from aborting Output comparison Line numbers of the failed assertions Software simulation vs. platform execution Hang occurred at a memory read at end of loop Solution Memory read replaced with memory write Correction allowed the process to complete execution 263 for ( i = 1; i < status; i++ ) { 264 IF_SIM(printf("HW:DE:%i\n",i);) 265 assert(0); 392 assert(0); 393 co_memory_readblock( ); 394 assert(0); 395 } 396 assert(0); 297 co_signal_post(done, status); Impulse-C design, XD1000, Stratix-II (EP2S180) 13

Application Case Studies Triple-DES Optimized assertions No latency overhead FPGA overhead to the right Standard assertions More ALUT (+0.125%) Higher freq. (144.74MHz) Edge detection Optimized assertions No performance overhead FPGA overhead to the right Standard assertions Less ALUTs (+0.03%) EP2S180 Original Assert Overhead Logic Used (out of 143520) Comb. ALUT (out of 143520) Registers (out of 143520) Block RAM (9383040 bits) Frequency (MHz) 13677 (9.53%) 7929 (5.52%) 10019 (6.98%) 222912 (2.37%) 13851 (9.65%) 8025 (5.59%) 10055 (7.01%) 223488 (2.38%) +174 (+0.12%) +96 (+0.07%) +36 (+0.03%) +221 (+0.04%) 145.71 141.98-3.73 (-2.56%) EP2S180 Original Assert Overhead Logic Used (out of 143520) Comb. ALUT (out of 143520) Registers (out of 143520) Block RAM (9383040 bits) Frequency (MHz) 12250 (8.54%) 6726 (4.69%) 9371 (6.53%) 141120 (1.50%) 12273 (8.56%) 6809 (4.75%) 9417 (6.56%) 141696 (1.51%) +23 (+0.02%) +83 (+0.06%) +46 (+0.03%) +576 (+0.01%) 77.52 79.31 +1.79 (+2.31%) Impulse-C design, XD1000, Stratix-II (EP2S180) 14

Scalability Case Study Resource overhead Optimized shown to right 128 processes 4.07% ALUTs standard 1.34% of ALUTs optimized Over a 3x improvement Frequency overhead Shown in graph to right 128 processes 154MHz standard 18.8% overhead 189MHz optimized 18.5% improvement Resource Overhead Frequency (MHz) 1.6% 1.4% 1.2% 1.0% 0.8% 0.6% 0.4% 0.2% 0.0% 210 200 190 180 170 160 Logic Used Comb ALUT Registers Block RAM Routing 0 16 32 48 64 80 96 112 128 Processes with Assertion Orignal Unoptimized Optimized 150 0 16 32 48 64 80 96 112 128 Processes with Assertion Impulse-C design, XD1000, Stratix-II (EP2S180) 15

Performance Overhead Case Study Single-comparison assertion Lower bound on optimization improvements Scalar variable Optimized overhead reduced to zero Array Optimized overhead Rate reduced to zero Latency reduced Impulse-C design, XD1000, Stratix-II (EP2S180) 16

Conclusions Created first framework/tool (to our knowledge) for HLS in-circuit assertion-based verification Familiar and easy to use ANSI-C assertions Automated conversion for Impulse C Application case studies performed Low area and frequency overhead Highly scalable Minimal to no change of application s state machine Future work Fully automate generation of optimized assertions Add capability to check timing via assertions 17

Questions 18

References 1. Deepchip, Mindshare vs. marketshare, http://www.deepchip.com/items/snug07-01.html, March 2008. 2. D. Pellerin and Thibault, Practical FPGA Programming in C. Prentice Hall PTR, 2005. 3. D. Poznanovic, Application development on the SRC Computers, Inc. systems, in Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, April 2005, pp. 78a 78a. 4. Impulse Accelerated Technologies, Codeveloper s users guide, 2008. 5. GNU, The GNU C library reference manual, http://www.gnu.org/software/libc/manual, March 2009. 6. Accellera, SystemVerilog 3.1a language reference manual, http://www.eda.org/sv/systemverilog 3.1a.pdf, May 2004. 7. Accellera, OVL open verification library manual, ver. 2.4, http://www.accellera.org/activities/ovl, March 2009. 8. Accellera, PSL language reference manual, ver. 1.1, http://www.eda.org/vfv/docs/psl-v1.1.pdf, June 2004. 9. M. Pellauer, M. Lis, D. Baltus, and R. Nikhil, Synthesis of synchronous assertions with guarded atomic actions, in Formal Methods and Models for Co-Design, 2005. MEMOCODE 05. Proceedings. Third ACM and IEEE International Conference on, July 2005, pp. 15 24. 10. M. Boule, J.-S. Chenard, and Z. Zilic, Assertion checkers in verification, silicon debug and in-field diagnosis, in Quality Electronic Design, 2007. ISQED 07. 8th International Symposium on, March 2007, pp. 613 620. 19

References 11. M. Kakoee, M. Riazati, and S. Mohammadi, Enhancing the testability of RTL designs using efficiently synthesized assertions, in Quality Electronic Design, 2008. ISQED 2008. 9th International Symposium on, March 2008, pp. 230 235. 12. K. Camera and R. Brodersen, An integrated debugging environment for fpga computing platforms, in Field Programmable Logic and Applications, 2008. FPL 2008. International Conference on, Sept. 2008, pp. 311 316. 13. Xilinx, ChipScope pro 10.1 software and cores user guide, http://www.xilinx.com/ise/verification/ chipscope pro sw cores 10 1 ug029.pdf, March 2008. 14. Altera, Design debugging using the SignalTap ii embedded logic analyzer, http://www.altera.com/literature/hb/qts/qts qii53009.pdf, March 2009. 15. K. Hemmert, J. Tripp, B. Hutchings, and P. Jackson, Source level debugger for the sea cucumber synthesizing compiler, in Field-Programmable Custom Computing Machines, 2003. FCCM 2003. 11 th Annual IEEE Symposium on, April 2003, pp. 228 237. 16. G. D. Micheli, Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994. 17. XtremeData Inc., XD1000 FPGA coprocessor module for socket 940, http://www.xtremedatainc.com/pdf/xd1000 Brief.pdf. 18. NIST, Data encryption standard (DES), http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf, October 1999. 20