Platform-based SW/HW Synthesis

Similar documents
Pilot: A Platform-based HW/SW Synthesis System

SpecC Methodology for High-Level Modeling

TKT-2431 SoC design. Introduction to exercises. SoC design / September 10

Nios Soft Core Embedded Processor

Graduate Institute of Electronics Engineering, NTU Advanced VLSI SOPC design flow

System-On-Chip Architecture Modeling Style Guide

EE382V: System-on-a-Chip (SoC) Design

SoC Design for the New Millennium Daniel D. Gajski

NISC Application and Advantages

Computer-Aided Recoding for Multi-Core Systems

TKT-2431 SoC design. Introduction to exercises

The SpecC Language. Outline

EE382V: System-on-a-Chip (SoC) Design

Multi-level Design Methodology using SystemC and VHDL for JPEG Encoder

Park Sung Chul. AE MentorGraphics Korea

System-on Solution from Altera and Xilinx

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

System Level Design Flow

EEL 5722C Field-Programmable Gate Array Design

Nios Embedded Processor Development Board

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems

System-on-Chip Environment

EEL 4783: Hardware/Software Co-design with FPGAs

A Generic RTOS Model for Real-time Systems Simulation with SystemC

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999

Embedded System Design

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

A Parallel Transaction-Level Model of H.264 Video Decoder

System-on-Chip Environment

100M Gate Designs in FPGAs

CS 335 Graphics and Multimedia. Image Compression

IMPLEMENTATION OF TIME EFFICIENT SYSTEM FOR MEDIAN FILTER USING NIOS II PROCESSOR

Introduction to Embedded Systems

Transaction-Level Modeling Definitions and Approximations. 2. Definitions of Transaction-Level Modeling

System-on-Chip Environment (SCE)

CprE 588 Embedded Computer Systems

Chapter 1. Digital Data Representation and Communication. Part 2

Network Synthesis for SoC

Parameterized System Design

Hardware Software Codesign of Embedded System

Embedded System Design and Modeling EE382V, Fall 2008

Interface Synthesis using Memory Mapping for an FPGA Platform. CECS Technical Report #03-20 June 2003

Codesign Methodology of Real-time Embedded Controllers for Electromechanical Systems

SPECC: SPECIFICATION LANGUAGE AND METHODOLOGY

RTL Coding General Concepts

System Level Design For Low Power. Yard. Doç. Dr. Berna Örs Yalçın

Embedded Software Generation from System Level Design Languages

Platform Selection Motivating Example and Case Study

Efficient design and FPGA implementation of JPEG encoder

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Software Codesign of Embedded Systems

Design Methodologies. Kai Huang

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components

FPGA: What? Why? Marco D. Santambrogio

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

Formal Deadlock Analysis of SpecC Models Using Satisfiability Modulo Theories

Appendix SystemC Product Briefs. All product claims contained within are provided by the respective supplying company.

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

: : (91-44) (Office) (91-44) (Residence)

Modeling and SW Synthesis for

Digital Systems Design. System on a Programmable Chip

JPEG Syntax and Data Organization

Compression II: Images (JPEG)

Communication Abstractions for System-Level Design and Synthesis

RTOS Modeling for System Level Design

Co-synthesis and Accelerator based Embedded System Design

Embedded System Design Modeling, Synthesis, Verification

Project design tutorial (I)

Digital Image Representation Image Compression

Hardware/Software Codesign

FPGA for Software Engineers

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Nios II Embedded Electronic Photo Album

Platform-based Design

A Hybrid Instruction Set Simulator for System Level Design

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

Introduction. Definition. What is an embedded system? What are embedded systems? Challenges in embedded computing system design. Design methodologies.

Automatic Generation of Communication Architectures

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

A New Design Methodology for Composing Complex Digital Systems

Cycle-approximate Retargetable Performance Estimation at the Transaction Level

Nios Soft Core. Development Board User s Guide. Altera Corporation 101 Innovation Drive San Jose, CA (408)

Intro to High Level Design with SystemC

Parallel Discrete Event Simulation of Transaction Level Models

System-level simulation (HW/SW co-simulation) Outline. EE290A: Design of Embedded System ASV/LL 9/10

Equivalence Checking of C Programs by Locally Performing Symbolic Simulation on Dependence Graphs

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

3-D Accelerator on Chip

FPGA IMPLEMENTATION OF HIGH SPEED DCT COMPUTATION OF JPEG USING VEDIC MULTIPLIER

Multicore Simulation of Transaction-Level Models Using the SoC Environment

Design methodology for programmable video signal processors. Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

Hardware Description Languages. Introduction to VHDL

Cosimulation of ITRON-Based Embedded Software with SystemC

Table 1: Example Implementation Statistics for Xilinx FPGAs

The Xilinx XC6200 chip, the software tools and the board development tools

Technical Report: Communication SW Generation from TL to PCA Level

System Level Design Technologies and System Level Design Languages

Transcription:

Platform-based SW/HW Synthesis Zhong Chen, Ph.D. (Visiting Prof. from Peking University) zchen@cs.ucla.edu SOC Group,UCLA Led by Jason Cong ICSOC Workshop Taiwan 03/28/2002

Contents Overview HW/SW Co-design Flow System Data Model Capability SIR MOC SDM-API Jpeg Example Further Research Topics

Overview Platform-based Synthesis Start from system level design description Target to FPSoC platform Automate the process as much as possible System Data Model MOC Model of Computation System-Level Synthesis Algorithms Incorporate models such as Funstate model etc. Internal Representation cover whole life-cycle of the flow SDM-API supports inter-operatability of CAD tools

Proposed Platform-based HW/SW Synthesis System Design Specification Platform Information Profiling Spec& Implementation Simulation System Data Model Hardware Estimation Software Estimation Partitioning Scheduling SW Code Gen VHDL HW Code Gen System Synthesis System P.E. Interface Synthesis HW synthesis C Code VHDL SW synthesis Target SW Target PLD

System Level Description to FPSoC Platform SLD: Support of concepts needed in system design Structural and behavioral hierarchy Concurrency State transitions Communication Exception handling Timing Select SpecC Language as an Input Superset of ANSI-C ANSI-C plus Extensions for HW-design Leverage of large set of existing program Software requirements are fully covered SpecC model PSM MOC Separation of communication and computation Hierarchical network of behaviors and channels Plug-and-play Source: System Design: A Practical Guide with SpecC, Andreas Gerstlauer etc.,kluwer Academic Publishers

Our Sample Target FPSoC Platform Excalibur TM Platform High-Performance Embedded Processor Nios CPU Up to 150K Gates Available for Customization EP20K200E Programmable Logic Device

Other optional version of Excalibur Nios CPU 75K Gates Available for Customization Nios Nios EP20K 100E Embedded System Blocks(ESBs) ESB Nios ESB Nios 500K Gates Available for Customization ESB ESB Multi-Processor Micro-Coded System

Components in our FPSoC Platform PLD: APEX TM20K200E (8320 LEs) Processor: Nios 16-bit or 32-bit Configurable 5-stage pipeline architecture One instruction per cycle Optimized for APEX PLD efficiency 20% of APEX EP20K200E device in 32-bit configuration Up to 50MIPS and 50MHz Memory: on-chip 256K Supports on-chip and off-chip memories I/O: Customizable, on-chip peripherals JTAG, PCI user-definable

JPEG Encoder An example BMP BMP Image Image File File Image Image Fragmentation Fragmentation DCT DCT JPEG: JPEG: an an standard standard for for image image compression compression DCT: DCT: Discrete Discrete Cosine Cosine Transform(ChenDCT) Transform(ChenDCT) Four Four mode mode of of the the operations operations in in JPEG JPEG standard standard Sequential Sequential DCT-based DCT-based mode mode Progressive Progressive DCT-based DCT-based mode mode Lossless Lossless mode mode Hierarchical Hierarchical mode mode Quantization Quantization Entropy Entropy Coding Coding JPG JPG Image Image File File

Jpeg in SpecC Source Code Files Global Global Chann Chann Adapter+ Adapter+ Header Header Huff+ Huff+ Dct++# Dct++# Quant+ Quant+ Handle+ Handle+ Default+ Default+ Encode+- Encode+- Jpeg+- Jpeg+- Io Io Design+- Design+- Tb Tb SpecC: Specification Language and Methodology Daniel D. Gajski etc., CECS, UC Irvine

Jpeg in SpecC Program Structure

SDM MoC : FunState-based MoC F SW c1 b F S b b F R b c1 F HW M1 M2 CE(M 1 )/F S /F R CE (M 2 )/F HW FunState An Internal Design Representation for Codesign, Karsten Strehl etc. IEEE Transactions on VLSI System, VOL. 9, No.4 AUGUST 2001

Jpeg : From SpecC to SDM representation Header Input Jpeg Output Data Pixel Input Jpeg Output Tb.sc with fixed control flow

Jpeg: Its MoC Representation in SDM Jpeg.sc JpegInit ImageWidth ImageHeight DCEhuff ACEhuff Jpeg Encode JpegEnd Data JpegInit JpegEncode JpegEnd JpegInit, JpegEnd are functions, JpegEncode is InnerComponent

Jpeg: Its MoC Representation in SDM JpegEncode.sc Receive Data stripe MDUWide JpegEncodeS tripe ImageWidth ImageHeight mduhigh DCEhuff ACEhuff MDUHigh mduhigh=0, MDUHigh= (ImageHeight+7)>>3 ~Cond Cond/ ReceiveData JpegEncode Stripe mduhigh+ + Cond is (mduhigh < MDUHigh)

Jpeg: Its MoC Representation in SDM JpegEncodeStripe.sc Handle Data A dct Quantiza B tion C Huffman Encode mduhigh stripe MDUWide DCEhuff ACEhuff mduwide ~Cond mduwide = 0 Cond/ HandleData dct quantization huffmancode mduwide ++

Partitioning and Scheduling - (now manually) SW Recv Send HW Input JPEG Receivedata JpegEncodeStripe Data Output DCT Send Recv Input Jpeg Output

Current flow where are we today Designer Simulate Simulate act act Profile Profile rpt rpt (7) (9) Simulator.exe Simulator.exe Profiling.exe Profiling.exe (6) (8) Design.cc Design.cc (5) Design.c Design.c (1) MyDesign.sc MyDesign.sc (2) Design.sc Design.sc (3) Design.sir Design.sir (4) Design.sdm Design.sdm (12) (10) Design.vhdl Design.vhdl (13) (11) 1) A System Designer Write a System-level design app in SpecC; 2) Rewrite it in order to go through our flow; Using a SubSet format of SpecC and modified semantics 3) Using scc to create.sir 4) Using psm2fs to convert.sir to.sdm 5) Using simgen to generate.cc for simulator 6) Compile the simulator using CC compiler; 7) Execute the simulator; 8) Compile to Profiling.exe using CC with profile options; 9) Execute it to generate Profile report; 10) Using hwcgen to generate.vhdl 11) Using Altara s tools to generate circuit.srec 12) Using sccgen to generate.c 13) Using target C compiler to generate executable code Design.exe Design.exe HW.srec HW.srec

Intermediate Research Achievements SDM- Converter SDM- Simulator SDM- C Code Generation tool SDM- SW Profiling tool SDM- HW Code Generation tool (partial)

Jpeg Implementation on Excalibur TM Platform Jpeg Software Nios CPU DCT Circuit EP20K200E Programmable Logic Device

Jpeg Compression Results 116x96x8 image in bmp format (12214 Bytes) 116x96x8 image in jpeg format (1704 Bytes)

JPEG Encoder Profiling BMP BMP Image Image File File Image Image Fragmentation Fragmentation 1.72% DCT DCT 77.47% Quantization Quantization 4.84% Entropy Entropy Coding Coding 15.97% JPG JPG Image Image File File

Run-time Profiling of Jpeg Program Module Name PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW) HandleData 391259.70/s* 2.56 µs 19878.67/s 50.31 µs 19878.67/s 50.31 µs 1.72% 1.22% 4.45% DCT 8659.61/s 115.48 µs 316.4/s 3160.56 µs 6328/s 158.03 µs 77.47% 76.46% 13.97% Quantization 138533.91/s 7.22 µs 5668.41/s 176.42 µs 5668.41/s 176.42 µs 4.84% 4.27% 15.60% HuffmanEncode 42010.25/s 23.8 µs 1339.96/s 746.29 µs 1339.96/s 746.29 µs 15.97% 18.05% 65.98% *Unit: execution times per second; time in micro-second(µs) of one time execution; rate among one time execution for processing one 8x8 image block of 256 colors.

Run-time Results of Jpeg Example Module Name PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW) 1 NIOS(SW+HW) 2 time (10-6 s) rate(%) time (10-6 s) rate(%) time (10-6 s) rate(%) time (10-6 s) rate(%) HandleData DCT Quantization HuffmanEncode 2.56 115.48 7.22 23.80 1.72% 77.47% 4.84% 15.97% 50.31 3160.56 176.42 746.29 1.22% 76.46% 4.27% 18.05% 50.31 1641.04 176.42 746.29 1.92% 62.78% 6.75% 28.55% 50.31 158.03 176.42 746.29 4.45% 13.97% 15.60% 65.98% (391259.7) (8659.61) (138533.91) (42010.25) (19878.67) (316.4) (5668.41) (1339.96) (19878.67) (609.37) (5668.41) (1339.96) (19878.67) (6328.00) (5668.41) (1339.96) Total 149.06 100.00% 4133.57 100.00% 2614.05 100.00% 1131.04 100.00% *Notes: one time execution for processing one 8x8 image block of 256 colors. 1: with half DCT implementation in order to fit in the area; Nios 1.1 work at 33Mhz 2: optimized DCT full implementation with simulation only ( by modulesim)

Research Topics Sytem-Level Synthesis Algorithm Partitioning and Scheduling Performance Estimation Architecture Exploration Hardware Interface Synthesis Architecture Exploration Platform Resource Information Software Synthesis Code Optimization with Resource Constraints Support Polymorphism Description of Channel and Interface(?)

HW implementations of DCT Performance LEs Rate PINs Rate ESBs Rate Clock Frequency EP20K200EFC 484-2X 8320 100% 376 100% 106496 100% Max. 33M (1)Nios+H_dc t+recv+send 609.37/s 6797 81% 111 29% 26496 24% 33.33 Half_dct only (recv+send) 4004 48% 30 7% 0 0 34.37 (2)NIOS+dct +Memory 569.26/s 4555 54% 111 29% 27840 26% 33.68 (Dct only) 1762 21.18% 30 7% 0 0 33.68 (1) Half-Dct implementation + Interface with PIO of Nios (through send + recv) (2) Full-Dct implementation + Memory as Interface

Open Discussion