Enabling the design of multicore SoCs with ARM cores and programmable accelerators

Similar documents
Adding C Programmability to Data Path Design

Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems

Ten Reasons to Optimize a Processor

Embedded Systems. 7. System Components

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing

EE382V: System-on-a-Chip (SoC) Design

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Welcome. Altera Technology Roadshow 2013

MPSoC Design Space Exploration Framework

Hardware Software Codesign of Embedded System

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Coarse Grain Reconfigurable Arrays are Signal Processing Engines!

Hardware Software Codesign of Embedded Systems

Cut DSP Development Time Use C for High Performance, No Assembly Required

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Simulation, prototyping and verification of standards-based wireless communications

ELC4438: Embedded System Design Embedded Processor

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

Reconfigurable Cell Array for DSP Applications

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs. November 2012

Embedded Computation

The S6000 Family of Processors

Hardware-Software Codesign. 1. Introduction

In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures

Easy Multicore Programming using MAPS

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi. Lecture - 10 System on Chip (SOC)

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture

Chapter 1 Introduction

Introducing the FPGA-Based Prototyping Methodology Manual (FPMM) Best Practices in Design-for-Prototyping

Altera SDK for OpenCL

Independent DSP Benchmarks: Methodologies and Results. Outline

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

Microprocessor Systems

Rapid: A Configurable Architecture for Compute-Intensive Applications

General Purpose Signal Processors

Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Key technologies for many core architectures

Technology for Innovators TM TI WIRELESS TECHNOLOGY DELIVERING ALL THE PROMISE OF 3G

Hardware Implementation and Verification by Model-Based Design Workflow - Communication Models to FPGA-based Radio

30 Years of TI s DSP: what s next? Fernando Mujica, Ph.D. Director, System Architectures Research Lab

DSP Core Instruction Set Architecture Design. Shih-Chieh Chang

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.

Kalray MPPA Manycore Challenges for the Next Generation of Professional Applications Benoît Dupont de Dinechin MPSoC 2013

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Digital Signal Processing System Design: LabVIEW-Based Hybrid Programming Nasser Kehtarnavaz

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

Übersetzerbau in österreichischen Softwarefirmen TU Wien

Software Defined Modem A commercial platform for wireless handsets

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

Extending the Power of FPGAs

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

FPGA for Dummies. Introduc)on to Programmable Logic

REAL-TIME DIGITAL SIGNAL PROCESSING

System on Chip (SoC) Design

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

Growth outside Cell Phone Applications

Digital Signal Processor 2010/1/4

Overview of SOC Architecture design

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

Xilinx DSP. High Performance Signal Processing. January 1998

Simulink Design Environment

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

New ARMv8-R technology for real-time control in safetyrelated

Dr. Ajoy Bose. SoC Realization Building a Bridge to New Markets and Renewed Growth. Chairman, President & CEO Atrenta Inc.

Fundamentals of Quantitative Design and Analysis

Xtensa 7 Configurable Processor Core

Design Once with Design Compiler FPGA

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

RISC-V CUSTOMIZATION WITH STUDIO 8

MPSOC Design examples

Dialog Semiconductor. Capital Markets Day 16 September 2015, London. connected

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Cover TBD. intel Quartus prime Design software

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

C6000 Compiler Roadmap

A framework for automatic generation of audio processing applications on a dual-core system

2008/12/23. System Arch 2008 (Fire Tom Wada) 1

All Programmable: from Silicon to System

The Changing Face of Edge Compute

Our Technology Expertise for Software Engineering Services. AceThought Services Your Partner in Innovation

Universität Dortmund. ARM Architecture

Computers as Components Principles of Embedded Computing System Design

Embedded Hardware and Software

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

Common Platform Ecosystem Enablement

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The extreme Adaptive DSP Solution to Sensor Data Processing

MPSOC 2011 BEAUNE, FRANCE

Code Generation for TMS320C6x in Ptolemy

Transcription:

Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies

Target Compiler Technologies Pioneer and leading provider of EDA tools for application-specific processors ASIPs Now expanding its reach to EDA tools for IP subsystems Worldwide activities Q in Leuven, Belgium US office in Boulder, Colorado Representation in China, Japan and Korea Incorporated in 996, spin-off of IMEC Independently owned, profitable company 03 Target Compiler Technologies

ASIPs in Multi-Core SoC ASIP: Application-Specific Processor Anything between general-purpose µp and hardwired data-path Flexibility through programmability and design-time reconfigurability igh throughput, low energy through parallelism and specialization ASIP is foundation of heterogeneous multi-core SoC Balanced SoC architecture offers best performance at lowest energy and lowest cost 03 Target Compiler Technologies 3

ASIP Benefits Maximise performance Architectural specialisation Parallelism: VLIW, SIMD, multi-core Minimise power dissipation Architectural specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation Leverage the benefits of programmable cores React to changing requirements & product differentiation ú Ship first for evolving standards ú Remedy defects ú Extend products to new marets without an SoC re-spin Major differentiator against RTL design and high-level synthesis 03 Target Compiler Technologies 4

No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 5

No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 6

IP Designer Tool Suite Typical users: ASIC/SoC design teams 03 Target Compiler Technologies 7

Broad Maret Adoption Medical Audio Video & imaging Graphics Wireless TM Wireline Networ processing igh-perf. computing Automotive Crypto & identification Industrial 03 Target Compiler Technologies Shown are publicly announced IP Designer customers only Estimate more than 50 unique SoC products based on IP Designer in the maret today 8

Graph-based C Compilation CDFG Application C C FRONT-END + << COMPILATION ENGINE PASE COUPLING ISG Processor model nml nml FRONT-END A sub_ab sub_ba add_ab add_ba C <<_C AR_w SOURCE-LEVEL TRANSF. CODE SELECTION Machine code Elf / Dwarf B REGISTER ALLOCATION SCEDULING CODE EMISSION Front end C Control-Data Flow Graph nml Instruction-Set Graph Compilation phases Map CDFG onto ISG Graph algorithms ISG contains structural info W resources, data types, connectivity, instruction encoding, instruction-level parallelism, instruction pipeline Closer to W than other compilers Enables efficient compilation for irregular architectures Patented 03 Target Compiler Technologies 9

Graph-based C compilation DSPstone benchmar on TI C55x NO ASSEMBLY REQUIRED Target s compiler TI s compiler Gain Target vs TI Cycles Code Size Cycles Code Size Cycles Code Size Small-scale C-code examples FIR restrict 45 39 6 37 6% -5% Convolution repeat 6 7 6 3 0% -3% LMS original 98 56 7 64 6% 3% Matrix repeat 49 46 54 53 44% 3% IIR, N=4 restrict 53 5 66 6 0% 8% IIR, N=6 restrict 49 5 98 6 5% 8% % 4% Large-scale C-code examples FFT bit reverse original 4636 9 49387 9 6% 0% FFT butterfly original 79374 67 876 77 4% 6% ADPCM original 446 3067 6978 3367 % 9% 7% 5% Graph-based C compiler technology offers retargetability and efficiency at same time Compilable sub-set of TI C55x modelled in nml in.5 person-months Only few C code modifications made: repeat loop, restrict pointers 03 Target Compiler Technologies 0

ardware Generator Example: audio DSP 90 nm, cloc 0 Mz, 0.9V 00 80 60 40 0 Area Gates 0-60% A B C D E 00 80 60 40 0 0 Power µw/mz IP Designer configuration options A Standard RTL generation B Cloc gating + operand isolation for functional units C Operand isolation for multiplexers D Latching of register addresses in instruction decoder E Manual design by customer Low-power optimisations yield 60% savings Low-power optimisations have small area cost Area and power within percentages from hand-optimized design 03 Target Compiler Technologies

CoolFlux DSP Low-Power Audio Ultra-low power DSP, optimised for audio coding Used in hearing instruments and portable audio players 4-bit precision Dual arvard ILP: 8 parallel operations, exploited by compiler 43K gates Power: 5 µw/mz @ 0.9 V 65 nm CMOS Rich library of audio codecs programmed with Target s tools 03 Target Compiler Technologies 006 NXP Semiconductors Reproduced with permission

03 Target Compiler Technologies 3 Design by Motorola Labs [Medea+ Project A0 Uppermost ] 80.n channel estimation and equalisation Matrix calculations Special operators in complex domain Multiple dataflow patterns to compute equalisation matrix G, depending on supported MIMO schemes ú SDM* ú Symmetric SDM+STBC** ú SDM+STBC * Spatial Division Multiplexing ** Space-Time Bloc Coding 4 4 4 4 * * * * = 43 44 3 4 44 43 4 3 * * * * = η +σ + = = N d Rx 4 3 η +σ + = = N d Rx N Y Y,,0 N Y Y,,0 N N N Rx Rx Y Y,,0 N Rx receive antennas,,,, NTx NTx NTx NRx NRx NTx N S S Y Y Rx + = η η " # " OFMD Symbols in the Frequency Domain MIMO received signals for the OFDM Sub-carrier N Tx transmit antennas N Rx receive antennas ˆ ˆ NRx N Y Y G S S Tx = " Estimate of transmitted OFDM Sub-carrier for the NTx transmit Antennas Additive Noise I d I d G = R G + = ηη B A B A B A R G + = ηη Matrix inversion Matrix inversion + Address computations Address computations Complex conjugate Square modulus 006 Motorola Labs Reproduced with permission WLAN-MIMO ASIP

WLAN-MIMO ASIP Architecture Channel Estimation 3 4 3 4 3 4 = 3 34 4 3 3 33 34 = 3 4 3 33 334 34 = 4 4 43 3 4 3 44 43 433 4334 = 44 3 3 = Sub 33 Carrier 34 Index 4 4 43 44 = Sub Carrier Index 4 4 43 44 = Sub Carrier Index = Sub Carrier Index ASIP Dual Port Memory Dual Port Memory Dual Port Memory Dual Port Memory Common Program Control GMAC 0 GMAC GMAC GMAC 3 4-way SIMD: vector processing of sub-carriers Complex arithmetic: cmpy, cadd, cconjugate Programmable datapath: specific data-flow patterns ~ ~ a = a + d 0 d d d 3 a = a + d d d d a = a +... GMAC 03 Target Compiler Technologies 4

IP Designer s Strengths Wide architectural scope From microprocessors over data-plane processors DSPs to programmable data-paths Enables IP development for any vertical maret see Broad Maret Adoption Next to ASIP architecture, user can model ASIP s periphery Unique retargetable compilation technology Recognized for code efficiency Recognized for instantaneous retargetability Enables rapid and efficient architectural exploration with compiler-in-theloop Enables compiler-based software development by ASIP users Low-power RTL generation technology Low power confirmed by wide adoption in hearing instrument, audio and wireless marets Flexible multicore debugging technology Connects to ISSs and to on-chip debug hardware 03 Target Compiler Technologies 5

No MPSoC Design Without Tools Tools at IP level ASIP cores Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level multicore Code parallelisation Communication and synchronization Multicore platform generation MP Designer 03 Target Compiler Technologies 6

MP Designer Tool Suite Typical users: multicore SoC design teams System SW design teams 03 Target Compiler Technologies 7

MP Designer Example: FM Receiver on multi-coolflux Tas Graph 03 Target Compiler Technologies 8

MP Designer ighlights omogeneous and heterogeneous* multicore SoCs User-guided parallelization pragmas C source-to-source transformation Global dataflow analysis to chec correctness of chosen parallelization Software code for communication and synchronization inserted automatically, using FIFO model Graphical feedbac tas graphs enables exploration for efficient load balancing Communication fabric platform generated automatically, if needed * eterogeneous: planned 03 Target Compiler Technologies 9

Conclusion ASIPs enable low-power, acceleration and programmability in multicore SoCs No efficient multicore SoC design without tools Design and programming of individual ASIP cores Multicore parallelisation and platform generation Target can be your ASIP and multicore tools partner sol.bergenbartel@retarget.com patric.verbist@retarget.com 03 Target Compiler Technologies 0