Effective System Design with ARM System IP

Similar documents
The Challenges of System Design. Raising Performance and Reducing Power Consumption

Growth outside Cell Phone Applications

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

AHB monitor. Monitor. AHB bridge. Expansion AHB ports M1, M2, and S. AHB bridge. AHB bridge. Configuration. Smart card reader SSP (PL022)

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

CMP Conference 20 th January Director of Business Development EMEA

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Building blocks for 64-bit Systems Development of System IP in ARM

Copyright 2016 Xilinx

The ARM Cortex-A9 Processors

SoC Platforms and CPU Cores

Contents of this presentation: Some words about the ARM company

Zynq-7000 All Programmable SoC Product Overview

CoreTile Express for Cortex-A5

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

ARM Connected Community Technical Symposium Reaching High Performance System Design Using AMBA Fabric IP

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

ARM Processors for Embedded Applications

Designing with ALTERA SoC Hardware

3D Graphics in Future Mobile Devices. Steve Steele, ARM

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

Multimedia in Mobile Phones. Architectures and Trends Lund

NS115 System Emulation Based on Cadence Palladium XP

SPEAr: an HW/SW reconfigurable multi processor architecture

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

Chapter 6 Storage and Other I/O Topics

ARM s IP and OSCI TLM 2.0

Product Technical Brief S3C2416 May 2008

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Creating hybrid FPGA/virtual platform prototypes

ARM instruction sets and CPUs for wide-ranging applications

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Test and Verification Solutions. ARM Based SOC Design and Verification

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Software Defined Modem A commercial platform for wireless handsets

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Fujitsu SOC Fujitsu Microelectronics America, Inc.

FPGA Adaptive Software Debug and Performance Analysis

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public

ECE 471 Embedded Systems Lecture 2

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

Validation Strategies with pre-silicon platforms

It's not about the core, it s about the system

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

ECE 471 Embedded Systems Lecture 3

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus

Jazelle. The ARM Architecture. NeON. Thumb

AMBA Protocol for ALU

Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006

Cannon Mountain Dr Longmont, CO LS6410 Hardware Design Perspective

Each Milliwatt Matters

Development of Low Power and High Performance Application Processor (T6G) for Multimedia Mobile Applications

Zynq Architecture, PS (ARM) and PL

Design Techniques for Implementing an 800MHz ARM v5 Core for Foundry-Based SoC Integration. Faraday Technology Corp.

Place Your Logo Here. K. Charles Janac

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

ECE 471 Embedded Systems Lecture 2

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Digital Blocks Semiconductor IP

Digital Blocks Semiconductor IP

Mobile & IoT Market Trends and Memory Requirements

Modular ARM System Design

FPGA Entering the Era of the All Programmable SoC

SQLoC: Using SQL database for performance analysis of an ARM v8 SoC

A 1-GHz Configurable Processor Core MeP-h1

Analyze system performance using IWB. Interconnect Workbench Dave Huang

Will Everything Start To Look Like An SoC?

Mobile & IoT Market Trends and Memory Requirements

Introduction CHAPTER IN THIS CHAPTER

ARMed for Automotive. Table of Contents. SHARP and ARM Automotive Segments SHARP Target Applications SHARP Devices SHARP Support Network Summary

Copyright 2014 Xilinx

ARM Mali -400 MP. The Scalable Multicore Graphics Processing Unit. Under embargo until June 2 nd, 2008

MYC-C7Z010/20 CPU Module

S2C K7 Prodigy Logic Module Series

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

The Cortex-A15 Verification Story

Designing with NXP i.mx8m SoC

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Combining Arm & RISC-V in Heterogeneous Designs

Introduction to Sitara AM437x Processors

100M Gate Designs in FPGAs

KeyStone C665x Multicore SoC

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Building Ultra-Low Power Wearable SoCs

Evolving IP configurability and the need for intelligent IP configuration

Transcription:

Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1

Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera Flash 9 128 MB DDR H.264 Skype 2

Processors are evolving, e.g. MP World-class market-proven technology 20+ processors for every application 200+ silicon partners 500+ licenses 15Bu shipped ARMv5 ARM968E-S ARM946E-S ARMv6 x1-4 ARM966E-S ARMv7 Cortex ARM1176JZ(F)-S ARM1156T2(F)-S ARM1136J(F)-S ARM1026EJ-S ARM926EJ-S ARM11 MPCore Cortex-A8 Cortex-R4 x1-4 Cortex-A9 Cortex-R4F ARM7EJ-S SC200 ARMv4 ARM7TDMI(S) ARM920T SC100 ARM922T Cortex-M3 Cortex-M1 SC300 Cortex-M0 3

ARM Mali GPU - Scalable Performance to over 1G Pixel/s Visual complexity Mali -400 MP Mali -200 Mali -55 Web Browsing Flash Lite Java Gaming Next Generation Navigation Mobile Gaming 3D Navigation Flash 10 TV HD UI Video Post Processing HD 3D Gaming Console 3D Gaming 2D/3D Presentations HD Video Post Processing Screen resolution 4

Higher Mobile Device Resolution Requirements of next generation Mobile platform - Increasing bandwidth requirements simply to refresh the display - Ignoring Fill rate, Input Vertex Data and Texture bandwidth 1080p30 1920x1080 1080p60 1920x1080 WSVGA 1024x600 WXGA 1280x800 Display Refresh Bandwidth MB/s WVGA 800x480 1080p60, 1920x1080, 60fps 475 1080p30, 1920x1080, 30fps 237 QVGA 320x240 VGA 640x480 720p, 1280x720, 30fps 105 WVGA, 800x480, 30fps 44 VGA, 640x480, 30fps 35 2007 2008 2009 2010 2011 2012 2013 5

Example SoC Mobile Platform CPU L2 CPU Cache L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 Bandwidth requirement Latency requirement Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller AMBA Interconnect LPDDR2 NAND Flash UART0 UART1 SPI WDT Timer0 Timer1 RTC GPIO 6

Example SoC Mobile Platform CPU L2 CPU Cache L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 AMBA Interconnect Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 7

ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 8

AMBA Ecosystem : The on-chip infrastructure is critical to system performance Increased focus on processor memory performance Different types of processors have different requirements ARM has grown the AMBA architecture eco-system to help accelerate SoC design: 70+ Connected Community partners have AMBA compatible products 10+ AMBA specification downloads a day the de facto standard is of course the ARM bus architecture, AMBA. Ron Wilson, EETimes 9

Design to Minimise Latency Each path must be designed to minimise the inherent pipeline latency Round trip memory latency Processor sub-system AXI Interconnect Dynamic Mem DDR2 PHY DDR2 SDRAM Address format and arbitration DDR2 SDRAM CAS latency De-skew and capture Data FIFO and bus interface Next generation AXI Interconnect halves the interconnect latency Masters which issue multiple AXI requests effectively hide latency PrimeCell Cache Controllers Trade an increase in minimum latency for dramatically reduced average latency 10

Design to Maximise Throughput Effective on-chip Quality of Service depends on the cooperation of the interconnect and memory controller Support for multiple outstanding requests The best use of memory pages by scanning the list of requests Controlling the order of queued transactions to Meet maximum latency targets Ensure throughput-dependent processors are well serviced Provide low latency paths 11

ARM Level2 Cache Controllers CPU CPU L2 L2 Cache Cache Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 AMBA Interconnect Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 12

L2CC Increases Processor Performance 512K L2 256K L2 128K L2 No L2 +104% +102% +74% 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Benchmark : MPEG4 decode System : ARM PrimeXsys Platform for ARM1136J-S CPU : 400MHz ARM1136J-S 16K I & D caches Memory : 100MHz 32 bit SDRAM L2 cache : L210 128K unified L2 cache MPEG4 Decode on ARM1136EJ-S Relative performance Web Page Render Time as a function of L2 Cache Size L2 Cache Size (KB) 512 256 128 0 First Time Subsequent Benchmark: Linux + Mozilla (5 html pages from I-Bench looped 4 times) CPU: Cortex-A8 (speed, L1 cache), L2 part of Cortex-A8 Results may vary for system configuration and web content 0.0 1.0 2.0 3.0 4.0 Speed Up Compared to 0K L2 13

L2CC Increases System Performance Reduced System Power Consumption External memory access ~10x more energy than on-chip External memory accesses reduced with L2 cache Enables use of lower-power and lower-cost memory sub-system E.g. 16-bit instead of 32-bit external interface Or LPDDR instead of DDR2 Reduced On-Chip traffic & contention Only cache misses propagated to the interconnect Improve overall system performances Provide more bandwidth to others SoC components 14

ARM AMBA Interconnect Cortex Cortex A8 A8 L2CC L2CC Media Media Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 NIC-301 Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash Digital Highway 15

AMBA Interconnect (NIC-301) Low latency communication for ARM CPUs High bandwidth for ARM Graphics and Video Supporting: AXI, AHB & APB Data widths from 32- to 128-bit Supporting both synchronous & GALS implementations Quality of service Configurable through AMBA Designer For minimum area & maximum frequency 16

Optimise your Interconnect Topology Real-time masters Real-time masters Cortex A9 Freq F Fx2.5 Cortex A9 RAM SMC DMC Fx2.5 Low bandwidth peripherals High connectivity & increasing numbers of IP cores does not scale with a single interconnect RAM SMC DMC Fx2.5 Low bandwidth peripherals Use properties of the traffic to influence the topology 17

Topology Optimisation with ARM Interconnect Cortex Cortex L2CC L2CC Neon Neon Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 NIC-301 400MHz Low Latency Interconnect NIC-301 200MHz Dynamic Dynamic Memory SDRAM Memory Controller Controller Static Static Memory Memory Interrupt Interrupt Controller Controller LPDDR2 NAND Flash 18

ARM Memory Controllers Cortex Cortex L2CC L2CC Neon Neon Graphic Graphic Video Video DMA DMA LCD LCD 64 or 128 Low Latency Interconnect DMC-34x DMC-34x SDRA M SMC-35x SMC-35x Interrupt Interrupt Controller Controller LPDDR2 NAND Flash 19

ARM Memory Controllers Synthesizable, Configurable soft cores Wide range of memory types, silicon processes & target applications AXI Dynamic Memory Controllers for SDR, DDR, LPDDR, DDR2 and LPDDR2 (DMC-34x) Over 20 licensees to date AXI Static Memory Controllers for NOR Flash, NAND Flash and SRAM (SMC-35x) Over 40 licensees to date AHB Memory Controllers for Dynamic and Static Memories (PL24x) Over 60 licensees to date 20

ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 21

What is AMBA Designer? Topolology Configure Cross-configure Stitch & Check 22

What is AMBA Designer? Topolology Configure Interface checking on: Signal widths Signal direction Interface properties Valid response types Interleave depth Cross-configure Stitch & Check (Export as individual signals) 23

ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 24

AVIP Features for RTL Simulation Functional IEEE 1800 SystemVerilog Testbench Verification For Verification ers, AVIP is a set of System Verilog modules that enable faster and higher quality verification of AXI based IP. Performance Exploration For SoC architects, HW and Verification ers. AXI based SoC performance can be explored and verified. 25 Directed Prof. Vectors Data AXI Master User VIP AXI Master AXI Slave Interface AXI Master Interface UUT User (Block or Sub-system) AXI Slave Interface AXI Master Interface AXI Slave AXI Monitor User IP Prof. Data

AVIP Features for RTL Simulation Protocol Checkers OVL and SVA assertion libraries provided for AXI protocol checking. IEEE 1800 SystemVerilog Testbench AXI Master User VIP AXI Master AXI Protocol Coverage Channel level, transaction level and sequence level predefined coverage points for AXI protocol coverage. 26 AXI Slave Interface AXI Master Interface UUT User (Block or Sub-system) AXI Slave Interface AXI Master Interface AXI Slave AXI Monitor User IP

AMBA Designer + AVIP: RTL Design Flow To optimise interconnect and memory architecture ARM recommends the following flow: Configuration Set the correct parameters and check 27 the components Integration Assemble the sub-system and statically check the design Simulation Run test scenarios to check usage modes Analysis Check results and loop back Configuration Configuration Integration Integration Simulation Simulation Analysis Analysis

Fabric Design Tools: What is AVIP? IEEE 1800 SystemVerilog Testbench AXI Slave Interface AXI Master Interface AXI Slave AXI Master User VIP UUT User (Block or Sub-system) AXI Slave Interface AXI Master 28 AXI Master Interface AXI Monitor User IP

Fabric Design Tools: What is AVIP? 29 It enables System Exploration at RTL level TTT = Time to tweak = 20s TTS = Time to simulate = 5 mins

System Exploration Methods SoC, static Spreadsheet Analysis Block-level, Internal bus, RTL simulation RTL simulation, AVIP, User VIP Industry standards VIP SoC, Real Stimulus, external I/F Acceleration/Emulation VIP, Logic Tiles, SW Real-time Behavior Silicon/Applications 30

Iteration time vs Realism LOW mins/hrs Cycle time days/wks mths/yrs HIGH AVIP Internal bus simulation Mathematical formula, not dynamic Statistical or recorded traffic profiles SoC + s/w Emulation/proto Adding S/W, external I/F with realistic scenarios Silicon + Appl CoreSight Observe actual behaviour LOW Realistic behaviour mins/hrs Spreadsheet Static analysis HIGH AVIP: the iteration time of a spreadsheet with the accuracy approaching RTL simulation 31

ARM Design Flow for Digital Highway Design Your Intelligent Digital Highway Configure and connect your RTL AMBA Designer Verification & performance exploration in simulation AVIP Improve your software CoreSight 32

Improve the Performance of Your SoC Analyzing real silicon performance enables you to confidently improve the next design If you want to find out how a car really performs, drive it CoreSight Design Kit & Performance Profiling Provide accurate, real-time telemetry from your system Essential tools for delivering system performance improvements Your SoC may be optimized, but is the software? ARM Profiler analyzes system performance, enabling optimization via Profile Driven Compilation 33

CoreSight Debug & Trace The Debug & Trace Architecture for the Digital World Open Standard available on www.arm.com Optimise software productivity on your multi-core SoC SW Debug SW Performance Optimisation SoC Performance optimisation Visibility and trace of the whole SoC ARM trace and performance sources (ETM, PTM, Interconnect) Leverage CoreSight architecture for YOUR IP 34

ARM Digital Highway ARM Digital Highway technology delivers to YOU Key Soft IP and Physical IP elements The de-facto communication standard Tools to analyze and optimize your system design before committing to silicon AVIP Solution to debug and optimise once your silicon has been manufactured Faster time to revenue through reducing design effort and ensuring quality of results 35