Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies
Target Compiler Technologies Pioneer and leading provider of EDA tools for application-specific processors ASIPs Now expanding its reach to EDA tools for IP subsystems Worldwide activities Q in Leuven, Belgium US office in Boulder, Colorado Representation in China, Japan and Korea Incorporated in 996, spin-off of IMEC Independently owned, profitable company 03 Target Compiler Technologies
ASIPs in Multi-Core SoC ASIP: Application-Specific Processor Anything between general-purpose µp and hardwired data-path Flexibility through programmability and design-time reconfigurability igh throughput, low energy through parallelism and specialization ASIP is foundation of heterogeneous multi-core SoC Balanced SoC architecture offers best performance at lowest energy and lowest cost 03 Target Compiler Technologies 3
ASIP Benefits Maximise performance Architectural specialisation Parallelism: VLIW, SIMD, multi-core Minimise power dissipation Architectural specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation Leverage the benefits of programmable cores React to changing requirements & product differentiation ú Ship first for evolving standards ú Remedy defects ú Extend products to new marets without an SoC re-spin Major differentiator against RTL design and high-level synthesis 03 Target Compiler Technologies 4
No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 5
No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 6
IP Designer Tool Suite Typical users: ASIC/SoC design teams 03 Target Compiler Technologies 7
Broad Maret Adoption Medical Audio Video & imaging Graphics Wireless TM Wireline Networ processing igh-perf. computing Automotive Crypto & identification Industrial 03 Target Compiler Technologies Shown are publicly announced IP Designer customers only Estimate more than 50 unique SoC products based on IP Designer in the maret today 8
Graph-based C Compilation CDFG Application C C FRONT-END + << COMPILATION ENGINE PASE COUPLING ISG Processor model nml nml FRONT-END A sub_ab sub_ba add_ab add_ba C <<_C AR_w SOURCE-LEVEL TRANSF. CODE SELECTION Machine code Elf / Dwarf B REGISTER ALLOCATION SCEDULING CODE EMISSION Front end C Control-Data Flow Graph nml Instruction-Set Graph Compilation phases Map CDFG onto ISG Graph algorithms ISG contains structural info W resources, data types, connectivity, instruction encoding, instruction-level parallelism, instruction pipeline Closer to W than other compilers Enables efficient compilation for irregular architectures Patented 03 Target Compiler Technologies 9
Graph-based C compilation DSPstone benchmar on TI C55x NO ASSEMBLY REQUIRED Target s compiler TI s compiler Gain Target vs TI Cycles Code Size Cycles Code Size Cycles Code Size Small-scale C-code examples FIR restrict 45 39 6 37 6% -5% Convolution repeat 6 7 6 3 0% -3% LMS original 98 56 7 64 6% 3% Matrix repeat 49 46 54 53 44% 3% IIR, N=4 restrict 53 5 66 6 0% 8% IIR, N=6 restrict 49 5 98 6 5% 8% % 4% Large-scale C-code examples FFT bit reverse original 4636 9 49387 9 6% 0% FFT butterfly original 79374 67 876 77 4% 6% ADPCM original 446 3067 6978 3367 % 9% 7% 5% Graph-based C compiler technology offers retargetability and efficiency at same time Compilable sub-set of TI C55x modelled in nml in.5 person-months Only few C code modifications made: repeat loop, restrict pointers 03 Target Compiler Technologies 0
ardware Generator Example: audio DSP 90 nm, cloc 0 Mz, 0.9V 00 80 60 40 0 Area Gates 0-60% A B C D E 00 80 60 40 0 0 Power µw/mz IP Designer configuration options A Standard RTL generation B Cloc gating + operand isolation for functional units C Operand isolation for multiplexers D Latching of register addresses in instruction decoder E Manual design by customer Low-power optimisations yield 60% savings Low-power optimisations have small area cost Area and power within percentages from hand-optimized design 03 Target Compiler Technologies
CoolFlux DSP Low-Power Audio Ultra-low power DSP, optimised for audio coding Used in hearing instruments and portable audio players 4-bit precision Dual arvard ILP: 8 parallel operations, exploited by compiler 43K gates Power: 5 µw/mz @ 0.9 V 65 nm CMOS Rich library of audio codecs programmed with Target s tools 03 Target Compiler Technologies 006 NXP Semiconductors Reproduced with permission
03 Target Compiler Technologies 3 Design by Motorola Labs [Medea+ Project A0 Uppermost ] 80.n channel estimation and equalisation Matrix calculations Special operators in complex domain Multiple dataflow patterns to compute equalisation matrix G, depending on supported MIMO schemes ú SDM* ú Symmetric SDM+STBC** ú SDM+STBC * Spatial Division Multiplexing ** Space-Time Bloc Coding 4 4 4 4 * * * * = 43 44 3 4 44 43 4 3 * * * * = η +σ + = = N d Rx 4 3 η +σ + = = N d Rx N Y Y,,0 N Y Y,,0 N N N Rx Rx Y Y,,0 N Rx receive antennas,,,, NTx NTx NTx NRx NRx NTx N S S Y Y Rx + = η η " # " OFMD Symbols in the Frequency Domain MIMO received signals for the OFDM Sub-carrier N Tx transmit antennas N Rx receive antennas ˆ ˆ NRx N Y Y G S S Tx = " Estimate of transmitted OFDM Sub-carrier for the NTx transmit Antennas Additive Noise I d I d G = R G + = ηη B A B A B A R G + = ηη Matrix inversion Matrix inversion + Address computations Address computations Complex conjugate Square modulus 006 Motorola Labs Reproduced with permission WLAN-MIMO ASIP
WLAN-MIMO ASIP Architecture Channel Estimation 3 4 3 4 3 4 = 3 34 4 3 3 33 34 = 3 4 3 33 334 34 = 4 4 43 3 4 3 44 43 433 4334 = 44 3 3 = Sub 33 Carrier 34 Index 4 4 43 44 = Sub Carrier Index 4 4 43 44 = Sub Carrier Index = Sub Carrier Index ASIP Dual Port Memory Dual Port Memory Dual Port Memory Dual Port Memory Common Program Control GMAC 0 GMAC GMAC GMAC 3 4-way SIMD: vector processing of sub-carriers Complex arithmetic: cmpy, cadd, cconjugate Programmable datapath: specific data-flow patterns ~ ~ a = a + d 0 d d d 3 a = a + d d d d a = a +... GMAC 03 Target Compiler Technologies 4
IP Designer s Strengths Wide architectural scope From microprocessors over data-plane processors DSPs to programmable data-paths Enables IP development for any vertical maret see Broad Maret Adoption Next to ASIP architecture, user can model ASIP s periphery Unique retargetable compilation technology Recognized for code efficiency Recognized for instantaneous retargetability Enables rapid and efficient architectural exploration with compiler-in-theloop Enables compiler-based software development by ASIP users Low-power RTL generation technology Low power confirmed by wide adoption in hearing instrument, audio and wireless marets Flexible multicore debugging technology Connects to ISSs and to on-chip debug hardware 03 Target Compiler Technologies 5
No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 6
MP Designer Tool Suite Typical users: multicore SoC design teams System SW design teams 03 Target Compiler Technologies 7
MP Designer Example: FM Receiver on multi-coolflux Tas Graph 03 Target Compiler Technologies 8
MP Designer ighlights omogeneous and heterogeneous* multicore SoCs User-guided parallelization pragmas C source-to-source transformation Global dataflow analysis to chec correctness of chosen parallelization Software code for communication and synchronization inserted automatically, using FIFO model Graphical feedbac tas graphs enables exploration for efficient load balancing Communication fabric platform generated automatically, if needed * eterogeneous: planned 03 Target Compiler Technologies 9
Conclusion ASIPs enable low-power, acceleration and programmability in multicore SoCs No efficient multicore SoC design without tools Design and programming of individual ASIP cores Multicore parallelisation and platform generation Target can be your ASIP and multicore tools partner sol.bergenbartel@retarget.com patric.verbist@retarget.com 03 Target Compiler Technologies 0