The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Similar documents
EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

SCope: Efficient HdS simulation for MpSoC with NoC

The Challenges of System Design. Raising Performance and Reducing Power Consumption

MPSOC Design examples

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Software Defined Modem A commercial platform for wireless handsets

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Embedded HW/SW Co-Development

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

System Level Design with IBM PowerPC Models

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

Performance Verification for ESL Design Methodology from AADL Models

Hardware/Software Co-design

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

A 1-GHz Configurable Processor Core MeP-h1

Co-synthesis and Accelerator based Embedded System Design

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Embedded System Design

IP CORE Design 矽智產設計. C. W. Jen 任建葳.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

Combining Arm & RISC-V in Heterogeneous Designs

Embedded Systems: Hardware Components (part I) Todor Stefanov

The Nios II Family of Configurable Soft-core Processors

Effective System Design with ARM System IP

Introduction to Embedded Systems

Fujitsu SOC Fujitsu Microelectronics America, Inc.

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Generating TLM Bus Models from Formal Protocol Specification. Tom Michiels CoWare

MPSoC Design Space Exploration Framework

Embedded System Design Modeling, Synthesis, Verification

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

Platform-based Design

System Design Paradigms. Electronics and the Car

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Tim Kogel. June 13, 2010

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

SPACE: SystemC Partitioning of Architectures for Co-design of real-time Embedded systems

Copyright 2016 Xilinx

Efficient use of Virtual Prototypes in HW/SW Development and Verification

Simulink -based Programming Environment for Heterogeneous MPSoC

ARM Processors for Embedded Applications

Energy scalability and the RESUME scalable video codec

S2C K7 Prodigy Logic Module Series

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

Hardware-Software Codesign. 1. Introduction

Integrated Development Environment

Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

100M Gate Designs in FPGAs

Digital Systems Design. System on a Programmable Chip

Introduction to Embedded Systems

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

CEVA-X1 Lightweight Multi-Purpose Processor for IoT

ASIC Logic. Speaker: Juin-Nan Liu. Adopted from National Chiao-Tung University IP Core Design

Embedded Hardware and Software

Adaptable Intelligence The Next Computing Era

HW / SW Implementation Overview (0A) Young Won Lim 7/16/16

SPEAr: an HW/SW reconfigurable multi processor architecture

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Design Space Exploration for Memory Subsystems of VLIW Architectures

COMPLEX EMBEDDED SYSTEMS

Hardware Design and Simulation for Verification

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

A Methodology for NoC

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Combining TLM & RTL Techniques:

Validation Strategies with pre-silicon platforms

Over the SoC Verification Hurdles

Graduate Institute of Electronics Engineering, NTU. ASIC Logic. Speaker: Lung-Hao Chang 張龍豪 Advisor: Prof. Andy Wu 吳安宇教授.

Embedded Systems. 7. System Components

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform

NETWORKS on CHIP A NEW PARADIGM for SYSTEMS on CHIPS DESIGN

An Introduction to Programmable Logic

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

Hardware-Software Codesign

Communication Oriented Design Flow

Intel Research mote. Ralph Kling Intel Corporation Research Santa Clara, CA

Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration

Virtual PLATFORMS for complex IP within system context

Multimedia in Mobile Phones. Architectures and Trends Lund

SoC Design Lecture 9: Platform Based Design. Shaahin Hessabi Department of Computer Engineering

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

Low-Power Processor Solutions for Always-on Devices

Transaction level modeling of SoC with SystemC 2.0

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Design and Implementation of an AHB SRAM Memory Controller

EE382V: System-on-a-Chip (SoC) Design

Platform-based SoC Design

Mapping applications into MPSoC

Transcription:

The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1

MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content Power consumption MP SoC..and it comes in many flavors 2

Agenda MPSoC Solution Space Virtual HW Platforms What are Virtual Platforms? Virtual Platform usage for Architectural Exploration Virtual Platform usage for SW development Requirements from Virtual Platforms The ESL Solution Pyramid 3

Solution Space: HW Implementation Standard 1 Task 7 Standard 2 Task 7 Standard 3 Task 7 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 High performance Low power HW Mapping Limited reuse No flexibility separate design for each platform Processor HW accelerators Memory 4 ASIC + Processor for global control

Solution Space: Programmable Accelerators Standard 1 Task 7 Standard 2 Task 7 Standard 3 Task Task 7 7 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task Task 1 Task Task Task 11 Task Task 1 2 2 Task Task 13 3 Task Task 4 4 Task Task 5 5 Task Task 6 6 Task Task 8 Task 81 Task 8 high performance high flexibility SW Mapping high design effort Low SW development productivity high differentiation Task 1 Task 2/3 Task 3/5/6/7 Accelerator Accelerator Accelerator Task 8 Processor + Ctrl µc IP 5 Mem Mem Mem Heterogeneous Custom Processor SoC

Homogeneous MP-SoC Mesh Standard 1 Task 7 Standard 2 Task 7 Standard 3 Task 7 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 8 Design flexibility Reuse for multiple apps Redundancy Processor SW Mapping Interconnect Difficult to program Communication becomes bottleneck Load Balancing High chip/royalty cost Memory 6 Homogeneous Multi-Processor Mesh

Solution Space: Heterogeneous SoC Standard 1 Task 7 Standard 2 Task 7 Standard 3 Task Task 7 7 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task Task 1 Task Task Task 11 Task Task 1 2 2 Task Task 13 3 Task Task 4 4 Task Task 5 5 Task Task 6 6 Task Task 8 Task 81 Task 8 Reuse of legacy design Reuse of legacy SW SW Mapping Load balancing Memory sub-system design Task 1 Task 6/7 HW Accelerator Task 3/4/5 Task 8 Accelerator Task 2 Accelerator Processor Accelerator + Ctrl Mem Mem Mem 7 Heterogeneous Multi-Processor SoC

The ESW View User Interface, User level functions Large ESW stack >10 Million lines of code C, C++, Java Soft real time - Defined by user perception Cache, MMU Use generic components for data movement Generic bus arbitration + peripherals MOPS GOPS KOPS User-Level Functions (Applications, UI) Middleware (Algorithm) Board Support Package Algorithm rich Small ESW stack ~1 Million lines of code C, C++ (no Java), or assembly, or Matlab Hard Real Time Cache off, MMU off Smart DMA, guaranteed Quality of Service Data locality NoC,, guaranteed QoS,, predictable latency Dedicated communication channels Hardware abstraction Layer + OS Support Thin layer ~100K lines of code C with inline Assembly, maybe C++ Accurate Response Time 8

What is a Virtual HW Platform? A SW model of the SoC HW Processors, accelerators, peripherals and interconnect HDS HAL, Drivers, O.S., Enables architecture exploration and optimization Enables ESW development, debugging and optimization Virtual HW Platform Custom DSP CorXTend Different abstraction levels for different use models FPGA FPGA Periph emem MEM 9

Virtual HW Platform Value Fast Allows accuracy-speed trade offs Flexible Cost effective Scaleable Observable and Controllable Available early in the design cycle 10

Virtual HW Platform Value SW Arch SW SW Arch HW/SW SW Arch Implementation Verification Implementation HW/SW Verification Implementation HW/SW Verification Virtual Platform Virtual Platform Emulation OR FPGA board Development board HW Arch HW (RTL) Design & RTL2GDSII Fab 4m 6m 6m 3m Early SW Development Real HW and SW Co-Design 11

Virtual HW Platform for Architecture Exploration Product Specification Traffic Generator Instruction Accurate Processor Memory Memory Controller DMA Baseband Accelerator accelerator (SPD) Custom Peripheral Processor (PD) 12

Platform Architecture Cycle Accurate TLM RAM RAM ROM ROM ARM926 ARM926 INT INT AHB AHB Bus Bus LCD LCD Loop Loop Filter Filter Input Input H.264 H.264 Decoder Decoder CABAC CABAC OCP OCP Bus Bus OCP OCP Bus Bus FRAME FRAME BUFFER BUFFER AV abstraction OCP TL2 13

Virtual Platforms for Architecture Exploration I need to constantly read and decode H.264 bitstreams. Virtual Platform helped me to find out that pixel data can be best transferred via DMAs from SRAMs. Control data can be directly put into the Entropy Encoder SRAM shared memory via my bus interface. Input Stream DMA I need fast an parallel access to reference and predicted frame data. Early architecture exploration helped me to find out that two SRAMs with DMA removes DMA this data bottleneck. SRAM Reference Pixels Motion Estimation Accelerator SRAM Output Pixel Data DMA DMA SRAM Predicted Pixels I am responsible for the control and synchronization of all cores. I do not require fast access to the pixel data. The access to the shared e.g. control and parameter data eases programming of this accelerator. Controller DMA SRAM Input Macroblock Deblocking Filter Accelerator 14 SDRAM Memory Controller DMA SRAM Output Macroblock I need to process the entire uncompressed 6MB frame 30 times per second! Architecture exploration helped me find out that DMA assisted input and output Macroblock memories are the best option.

Virtual HW Platform for ESW Development Product Specification Virtual Platform Instruction Traffic Accurate Generator ISS Instruction Accurate Instruction Baseband Processor Accurate Sub System DSP ISS Memory Memory Controller DMA Baseband Accelerator Peripheral (SPD) Custom Peripheral Processor (PD) 15

Virtual Platforms for ESW Development Virtual Cell Phone App s SW App s SW App s SW App s SW GUI App s SW App s SW App s SW App s SW SWAPI SWAPI SW Development Tools gdb DDD Ext SWAPI AP-RTOS Int SWAPI BP OS Int Ext AP-RTOS Int BP OS Int IP Library HAL System Simulation HAL HAL HAL Display Key Board Chip Set Base Station Platform Architect Simulator Display Key Board USB Host HW (PC, EWS) JTAG Cell Phone HW JTAG USB RF 16

17 Virtual HW Platform for ESW development

Virtual Platform What is Required? Ultra fast processor simulator ISS Accuracy-speed trade-offs Library of fast processor models Explore different communication paradigms Strong analysis solutions Library of communication protocols Integrate DSP algorithm Library of popular Wireless and MM standards Ultra fast platform Accuracy-speed trade-offs Multi core and platform debugging Standard based 18

Why Standards Based? Device Manufacturer Develop your own Processor Designer Signal Processor Designer Platform Creator Custom DSP CorXTend Virtual Platforms Model Designer IP Providers FPGA FPGA Periph emem MEM Platform Architect 19

Solutions Pyramid ESL Assessment ESL Methodology Transition Vertical Markets Video Office Wireless Product Platform Capture & Analysis for Architecture Development Product Platform Capture & Distribution for SW Development Processor Modeling Interconnect Modeling & Design Peripheral Modeling Custom Processor Design Algorithm Design Block IP Design and Verification 20

Summary ESL Assessment ESL Methodology Transition Vertical Markets Video Office Wireless Custom processor design Complex algorithm design SoC architecture exploration SoC verification Modeling for speed Virtual Platforms for ESW Integrated solution Pyramid Product Platform Capture & Analysis for Architecture Development Product Platform Capture & Distribution for SW Development Processor Modeling Interconnect Modeling & Design Peripheral Modeling Custom Processor Design Algorithm Design Block IP Design and Verification 21

So what s the next level of abstraction? Results (transistors per engineering effort) Processor is the nand gate of the future 1999 Level 1985 1992 Register transfer level 1978 a 0 d q Gate/logic level b 1 s clk Transistor level 10-100X on Simulation 10-100X on design productivity Impose design restrictions Effort 22

The Next Level Challenge The next level is (maybe): Processors and HW accelerators SW tasks running on these processors Transaction level modeling to model communication What are the design restrictions we need to impose to gain 100X productivity?...and what are the benefits to the users? Processors Simplify? No Cache? No MMU? Speculative execution? Branch prediction? SW tasks Threads? Locks? R/W to/from ports only? Programming models to abstract the SoC (message passing, SMP, streaming)? What do we do with all the legacy code? Communication Memory sub system design, NoC design 23