Platform-based SW/HW Synthesis Zhong Chen, Ph.D. (Visiting Prof. from Peking University) zchen@cs.ucla.edu SOC Group,UCLA Led by Jason Cong ICSOC Workshop Taiwan 03/28/2002
Contents Overview HW/SW Co-design Flow System Data Model Capability SIR MOC SDM-API Jpeg Example Further Research Topics
Overview Platform-based Synthesis Start from system level design description Target to FPSoC platform Automate the process as much as possible System Data Model MOC Model of Computation System-Level Synthesis Algorithms Incorporate models such as Funstate model etc. Internal Representation cover whole life-cycle of the flow SDM-API supports inter-operatability of CAD tools
Proposed Platform-based HW/SW Synthesis System Design Specification Platform Information Profiling Spec& Implementation Simulation System Data Model Hardware Estimation Software Estimation Partitioning Scheduling SW Code Gen VHDL HW Code Gen System Synthesis System P.E. Interface Synthesis HW synthesis C Code VHDL SW synthesis Target SW Target PLD
System Level Description to FPSoC Platform SLD: Support of concepts needed in system design Structural and behavioral hierarchy Concurrency State transitions Communication Exception handling Timing Select SpecC Language as an Input Superset of ANSI-C ANSI-C plus Extensions for HW-design Leverage of large set of existing program Software requirements are fully covered SpecC model PSM MOC Separation of communication and computation Hierarchical network of behaviors and channels Plug-and-play Source: System Design: A Practical Guide with SpecC, Andreas Gerstlauer etc.,kluwer Academic Publishers
Our Sample Target FPSoC Platform Excalibur TM Platform High-Performance Embedded Processor Nios CPU Up to 150K Gates Available for Customization EP20K200E Programmable Logic Device
Other optional version of Excalibur Nios CPU 75K Gates Available for Customization Nios Nios EP20K 100E Embedded System Blocks(ESBs) ESB Nios ESB Nios 500K Gates Available for Customization ESB ESB Multi-Processor Micro-Coded System
Components in our FPSoC Platform PLD: APEX TM20K200E (8320 LEs) Processor: Nios 16-bit or 32-bit Configurable 5-stage pipeline architecture One instruction per cycle Optimized for APEX PLD efficiency 20% of APEX EP20K200E device in 32-bit configuration Up to 50MIPS and 50MHz Memory: on-chip 256K Supports on-chip and off-chip memories I/O: Customizable, on-chip peripherals JTAG, PCI user-definable
JPEG Encoder An example BMP BMP Image Image File File Image Image Fragmentation Fragmentation DCT DCT JPEG: JPEG: an an standard standard for for image image compression compression DCT: DCT: Discrete Discrete Cosine Cosine Transform(ChenDCT) Transform(ChenDCT) Four Four mode mode of of the the operations operations in in JPEG JPEG standard standard Sequential Sequential DCT-based DCT-based mode mode Progressive Progressive DCT-based DCT-based mode mode Lossless Lossless mode mode Hierarchical Hierarchical mode mode Quantization Quantization Entropy Entropy Coding Coding JPG JPG Image Image File File
Jpeg in SpecC Source Code Files Global Global Chann Chann Adapter+ Adapter+ Header Header Huff+ Huff+ Dct++# Dct++# Quant+ Quant+ Handle+ Handle+ Default+ Default+ Encode+- Encode+- Jpeg+- Jpeg+- Io Io Design+- Design+- Tb Tb SpecC: Specification Language and Methodology Daniel D. Gajski etc., CECS, UC Irvine
Jpeg in SpecC Program Structure
SDM MoC : FunState-based MoC F SW c1 b F S b b F R b c1 F HW M1 M2 CE(M 1 )/F S /F R CE (M 2 )/F HW FunState An Internal Design Representation for Codesign, Karsten Strehl etc. IEEE Transactions on VLSI System, VOL. 9, No.4 AUGUST 2001
Jpeg : From SpecC to SDM representation Header Input Jpeg Output Data Pixel Input Jpeg Output Tb.sc with fixed control flow
Jpeg: Its MoC Representation in SDM Jpeg.sc JpegInit ImageWidth ImageHeight DCEhuff ACEhuff Jpeg Encode JpegEnd Data JpegInit JpegEncode JpegEnd JpegInit, JpegEnd are functions, JpegEncode is InnerComponent
Jpeg: Its MoC Representation in SDM JpegEncode.sc Receive Data stripe MDUWide JpegEncodeS tripe ImageWidth ImageHeight mduhigh DCEhuff ACEhuff MDUHigh mduhigh=0, MDUHigh= (ImageHeight+7)>>3 ~Cond Cond/ ReceiveData JpegEncode Stripe mduhigh+ + Cond is (mduhigh < MDUHigh)
Jpeg: Its MoC Representation in SDM JpegEncodeStripe.sc Handle Data A dct Quantiza B tion C Huffman Encode mduhigh stripe MDUWide DCEhuff ACEhuff mduwide ~Cond mduwide = 0 Cond/ HandleData dct quantization huffmancode mduwide ++
Partitioning and Scheduling - (now manually) SW Recv Send HW Input JPEG Receivedata JpegEncodeStripe Data Output DCT Send Recv Input Jpeg Output
Current flow where are we today Designer Simulate Simulate act act Profile Profile rpt rpt (7) (9) Simulator.exe Simulator.exe Profiling.exe Profiling.exe (6) (8) Design.cc Design.cc (5) Design.c Design.c (1) MyDesign.sc MyDesign.sc (2) Design.sc Design.sc (3) Design.sir Design.sir (4) Design.sdm Design.sdm (12) (10) Design.vhdl Design.vhdl (13) (11) 1) A System Designer Write a System-level design app in SpecC; 2) Rewrite it in order to go through our flow; Using a SubSet format of SpecC and modified semantics 3) Using scc to create.sir 4) Using psm2fs to convert.sir to.sdm 5) Using simgen to generate.cc for simulator 6) Compile the simulator using CC compiler; 7) Execute the simulator; 8) Compile to Profiling.exe using CC with profile options; 9) Execute it to generate Profile report; 10) Using hwcgen to generate.vhdl 11) Using Altara s tools to generate circuit.srec 12) Using sccgen to generate.c 13) Using target C compiler to generate executable code Design.exe Design.exe HW.srec HW.srec
Intermediate Research Achievements SDM- Converter SDM- Simulator SDM- C Code Generation tool SDM- SW Profiling tool SDM- HW Code Generation tool (partial)
Jpeg Implementation on Excalibur TM Platform Jpeg Software Nios CPU DCT Circuit EP20K200E Programmable Logic Device
Jpeg Compression Results 116x96x8 image in bmp format (12214 Bytes) 116x96x8 image in jpeg format (1704 Bytes)
JPEG Encoder Profiling BMP BMP Image Image File File Image Image Fragmentation Fragmentation 1.72% DCT DCT 77.47% Quantization Quantization 4.84% Entropy Entropy Coding Coding 15.97% JPG JPG Image Image File File
Run-time Profiling of Jpeg Program Module Name PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW) HandleData 391259.70/s* 2.56 µs 19878.67/s 50.31 µs 19878.67/s 50.31 µs 1.72% 1.22% 4.45% DCT 8659.61/s 115.48 µs 316.4/s 3160.56 µs 6328/s 158.03 µs 77.47% 76.46% 13.97% Quantization 138533.91/s 7.22 µs 5668.41/s 176.42 µs 5668.41/s 176.42 µs 4.84% 4.27% 15.60% HuffmanEncode 42010.25/s 23.8 µs 1339.96/s 746.29 µs 1339.96/s 746.29 µs 15.97% 18.05% 65.98% *Unit: execution times per second; time in micro-second(µs) of one time execution; rate among one time execution for processing one 8x8 image block of 256 colors.
Run-time Results of Jpeg Example Module Name PC(PIII 650MHz) NIOS(SW) NIOS(SW+HW) 1 NIOS(SW+HW) 2 time (10-6 s) rate(%) time (10-6 s) rate(%) time (10-6 s) rate(%) time (10-6 s) rate(%) HandleData DCT Quantization HuffmanEncode 2.56 115.48 7.22 23.80 1.72% 77.47% 4.84% 15.97% 50.31 3160.56 176.42 746.29 1.22% 76.46% 4.27% 18.05% 50.31 1641.04 176.42 746.29 1.92% 62.78% 6.75% 28.55% 50.31 158.03 176.42 746.29 4.45% 13.97% 15.60% 65.98% (391259.7) (8659.61) (138533.91) (42010.25) (19878.67) (316.4) (5668.41) (1339.96) (19878.67) (609.37) (5668.41) (1339.96) (19878.67) (6328.00) (5668.41) (1339.96) Total 149.06 100.00% 4133.57 100.00% 2614.05 100.00% 1131.04 100.00% *Notes: one time execution for processing one 8x8 image block of 256 colors. 1: with half DCT implementation in order to fit in the area; Nios 1.1 work at 33Mhz 2: optimized DCT full implementation with simulation only ( by modulesim)
Research Topics Sytem-Level Synthesis Algorithm Partitioning and Scheduling Performance Estimation Architecture Exploration Hardware Interface Synthesis Architecture Exploration Platform Resource Information Software Synthesis Code Optimization with Resource Constraints Support Polymorphism Description of Channel and Interface(?)
HW implementations of DCT Performance LEs Rate PINs Rate ESBs Rate Clock Frequency EP20K200EFC 484-2X 8320 100% 376 100% 106496 100% Max. 33M (1)Nios+H_dc t+recv+send 609.37/s 6797 81% 111 29% 26496 24% 33.33 Half_dct only (recv+send) 4004 48% 30 7% 0 0 34.37 (2)NIOS+dct +Memory 569.26/s 4555 54% 111 29% 27840 26% 33.68 (Dct only) 1762 21.18% 30 7% 0 0 33.68 (1) Half-Dct implementation + Interface with PIO of Nios (through send + recv) (2) Full-Dct implementation + Memory as Interface
Open Discussion