The Challenges of System Design. Raising Performance and Reducing Power Consumption

Size: px
Start display at page:

Download "The Challenges of System Design. Raising Performance and Reducing Power Consumption"

Transcription

1 The Challenges of System Design Raising Performance and Reducing Power Consumption 1

2 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2

3 Product Challenge - Software For software engineers Good visibility Power Osprey Management Application Processors CPU1 CPU 2 CPU 3 CPU 4 AXI Interconnect CoreSight Debug & Trace On-Chip Debug & Trace DMA Controller HD LCD controller CPU L2 cache Coherency, Virtualisation Coherent interconnect AXI Interconnect AXI Interconnect DDR3/LPDDR2 Memory Controller Static Memory Controller PCIe Media Processors A standard easy-to-program h/w platform Graphics processor Video engine AXI interconnect APB Peripherals SRAM ARM Profiler To optimise performance 3

4 Design Challenge - PPA Power Osprey Management Performance Performance Performance Application Processors CPU 3 CPU L2 cache Media Processors Graphics processor Video engine Coherent interconnect AXI Interconnect Power CPU1 CPU 2 CPU 4 Power AXI AXI Interconnect Interconnect Power AXI interconnect CoreSight Debug & Trace DMA Controller HD LCD controller DDR3/LPDDR2 APB Power Memory Controller Static Memory Controller PCIe Peripherals SRAM 4

5 How to optimise your software and understand what your design VISIBILITY FOR OPTIMISATION 5

6 On Chip Visibility: a key requirement Power Osprey Management Application Processors CPU1 CPU 3 CPU 2 CPU 4 AXI Interconne ct CoreSight Debug & Trace DMA Controller HD LCD controller CPU L2 cache Coherent interconn ect AXI Interconnect DDR3/LPDDR2 Memory Controller Static Memory Controller AXI Interconnect PCIe Media Processors Graphics processor Video engine AXI interconnect AP B Peripherals SRAM 6

7 Typical CoreSight System Cross triggering between cores Single debug access port Cost effective debug AMBA AXI Cross trigger matrix System Trace Example ARM SoC SWD DAP New ETM Cortex A9 PTM Interface Cross Trigger Cortex R4 ETMR4 CS Interface Cross Trigger DSP DSP ETM Interface Cross Trigger Bus trace System trace APB bridge Shared s Port Debug bus (APB) Trace bus (ATB) Funnel Debug control bus RealView ICE Trace bus for system trace RealView Trace Trace port Trace Port Interface Unit Embedded Trace Buffer Buffer Trace Collection strategies 7

8 Software profiling using CPU Trace Top-down insight into the analyzed software Starting with overview screen, containing top 5 functions by Self Time, Delay and Memory access Detailed information on the source code and its derived assembly code, annotated with performance information Code coverage Source associated instructions Cycles per instruction Interlock information 8

9 System Trace Macrocell - STM System level visibility required by application development up to final product Debug and tuning of s/w applications running on OS Tracing of system events and system performance PMU Counts OS Trace System Trace Macrocell enables High level application software view Tuning of system performance Tracing of SoC internal signals Benefits Flexible and affordable hardware based debug for applications and system level developers Complements CPU trace, MIPI STPv2 compliant 9

10 System Level Code Instrumentation System level debug information can be sent through trace to debug your System static inline void stm_emit(unsigned int port, unsigned int value) { stm_addr[port] = value; } static inline void stm_emit_blocking(unsigned int port, unsigned int value) { // Reading from an stm port returns 1 if the FIFO can // accept data, 0 if it is full. while(!stm_addr[port]); stm_addr[port] = value; } Export debug information Visualise system level data 10

11 Event Profiling using STM Cortex-A9 Cortex-A9 11 L2 cache

12 Trace Memory Controller Single solution for cost effective and flexible trace collection SoC visibility in final product with only 2 pins Storage of trace using low cost system memory Routing to Gigabit links such as HSSTP or Reduce trace overflows and trace port size by averaging out trace bandwidth 12 Bits / cycle Ethernet Existing modes with ETB (SRAM) & Trace Port (TPIU)

13 Getting the highest performance at the lowest power consumption EFFICIENT SOC DESIGN 13

14 Introduction Systems use external memory Large address space Low cost-per-bit Large interface bandwidth Challenge: Manage the flow of data to and from external memory to present the best bandwidth and latency characteristics to each processing element 14 GPU Comms control Geometry processor Renderer Apps processor Tiling Network interface DMA Controller Display Controller Audio CODEC Interconnect Image Transform Video required Access to memory depends on accesses from other processing elements CPU Motion Estimation Motion Compensate buffer Primitives Frame buffer Dynamic Memory Ctrl Texture Primitives Application Memory Static Memory Ctrl buffer Tile lists Media source NAND Flash Physical View

15 QoS Contracts Minimum Bandwidth Minimum Bandwidth CPU GPU Comms control Geometry processor Renderer Apps processor Minimize Latency Tiling Network interface DMA Controller Maximum Latency or Minimum Bandwidth Display Controller Audio CODEC Interconnect Image Transform Video Minimum Bandwidth Motion Estimation Motion Compensate Minimum Bandwidth buffer Primitives Frame buffer 15 Maximum Latency Dynamic Memory Ctrl Texture Primitives Application Memory Static Memory Ctrl buffer Tile lists Media source NAND Flash Maximum Latency

16 QoS Objectives Allocate system capacity (latency and bandwidth) to each master to meet the contract Dynamically vary the priority to react to changes in bus traffic If there is excess capacity Allocate excess to where it can offer the most improvement Usually reducing the CPU latency Allocate excess to masters that can reduce performance later If there is insufficient capacity Remove capacity from masters that have the least impact on system performance 16

17 System Latency Latency is added throughout the system in two forms: Static latency the delay through pipeline stages Constant and specific to the path from master to slave Queuing latency the delay at arbitration points in the system The delay for each transaction depends on the number of transactions ahead of it in the queue and the rate at which they are processed The queue length depends on the capacity of the slave (memory type and efficiency), and the desired throughput Efficiency of the Memory Controller is a function of: Queue length, Burst length, Read-write mix, Address distribution Population Efficiency System Latency 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Static Latency Queuing Latency AMBA DMC-341 : Average Burst Length Latency/clocks 17

18 System Interface Characteristics The performance on the CPU master is determined by the latency characteristics that it sees from the system Determined by the bandwidth Other masters can be replaced with traffic profile generators (VPE) Calibrated to generate the same traffic behaviour 18 GPU Apps processor Geometry processor Renderer Tiling Network interface DMA Controller Display Controller Audio CODEC Interconnect Image Transform Video from the other masters Also by the efficiency of the memory controller Depends on the burst characteristics of the traffic Comms control Motion Estimation Motion Compensate buffer Primitives Frame buffer Dynamic Memory Ctrl Texture Primitives Application Memory Static Memory Ctrl buffer Tile lists Media source NAND Flash

19 VPE Verification and Performance Exploration The AMBA VPE design tool is for verification of the system performance: A graphical profiling toolkit to generate & view traffic profiles 3 verification components: AXI Monitor, AXI Master, AXI Slave Runs on all of the big 3 RTL simulation tools Speeds up RTL simulation by Giving-up execution of functions (e.g. CPU, GPU) in favour of 19 emulating their traffic No need to model their cycle-accurate behaviour as a result Replacing real data with constrained random data Can test typical and worst case scenarios

20 System Interface Characteristics The performance on the master is determined by the latency characteristics that it sees from the system Determined by the bandwidth Other masters can be replaced with traffic profile generators (VPE) Calibrated to generate the same traffic behaviour 20 VPE Master Geometry processor Renderer Tiling Network interface DMA Controller Audio CODEC Display Controller Interconnect Image Transform Video from the other masters Also by the efficiency of the memory controller Depends on the burst characteristics of the traffic GPU Motion Estimation Static Memory Ctrl Motion Compensate NAND Flash VPE Slave

21 Calibrating VPE Master Behaviour Benchmarks Bus Master VPE Slave VPE monitor Run benchmark applications on the bus master VPE Monitor captures the traffic profile VPE Slave varies the latency seen by the master System Architect selects a representative set of benchmarks Benchmark results provide bandwidth and latency contracts Traffic profile and latency sensitivity results are used to generate a VPE model of the bus master 21

22 Better designs more quickly Iteration time of a spreadsheet with the accuracy approaching RTL simulation Spreadsheet Analysis minutes/hours RTL simulation, VPE, User VIP Industry standards VIP Statistical or recorded traffic profiles days/weeks months/years HIGH 22 Acceleration/ Emulation VIP, Logic Tiles, SW Silicon/ Applications Adding S/W, external I/F with realistic scenarios Observe actual behaviour LOW Realistic behaviour minutes/hours Mathematical formula, not dynamic Cycle time LOW HIGH

23 Reducing Visible System Latency Write data can be buffered The latency for write traffic seen by the system is significantly reduced Can be used to reduce read latency Prioritize reads Cache memory reduces latency seen by the master Also reduces system bandwidth which reduces latency to other masters Diminishing returns from increases in cache size Unbuffered Read Write % 13% 16% 19% 22% 25% 28% 31% 34% 37% 40% 43% 46% 49% 52% 55% 58% 61% 64% 67% 70% 73% 76% 79% 82% 85% 88% 91% 94% As long as coherency is managed 30 Burst Latency 35 System Utilization

24 Increasing Latency Tolerance Masters that generate transactions that are weakly dependent on the completion of previous transactions Can issue multiple outstanding transactions Multiple outstanding transactions can eliminate the effects of static (pipeline) latency The have no impact on the effects from dynamic latency Additional outstanding transactions will increase the queue length Static latency Static latency 24 Processing rate

25 Queue location A queue is implemented in the memory controller Allows re-ordering of transactions to maximize efficiency If the queue fills it extends through the interconnect Interconnect arbitration only operates when the queue extends through the interconnect For effective QoS, the arbitration policy should be consistent throughout the system System topology influences the performance CPU and LCD Ctrl placed close to the memory controller Lower latency Mali and DMA on a separate 25 Interconnect Mali-VE6 Memory Ctrl Mali-400 DMA Ctrl level LCD Ctrl Peripheral Cortex-A9 Peripheral Hierarchy improves performance

26 Stream Processing Masters Adding latency does not affect Priority Time-out performance While latency is less than the maximum Entry priority set third highest Reduces the latency to the Best-effort Best-effort Stream processing masters Batch processing If the transaction is still waiting after a masters when necessary 100% 80% Survival time-out period Promoted to highest priority Only higher priority than Best-effort Time-out 60% Maximum latency 40% 20% 0% Latency/clocks

27 Batch Processing Masters Average latency, bandwidth and queue Priority length related by Little s Law E(L)=λ.E(S) Time-out Hold queue length constant Best-effort Measure average latency and control priority Stream processing Priority controls latency which controls bandwidth Excess bandwidth is used Priority only exceeds other masters over Best-effort masters 2000 Bandwidth/MB/s when Insufficient bandwidth obtained Minimizes transactions prioritized Batch processing Latency/clocks

28 QoS with Existing Memory Controllers Existing memory controllers have only 3 priority Priority levels CPU given high priority but demoted if there is Time-out Best-effort insufficient minimum bandwidth available for the batch processors Increase proportional to outstanding transactions Excess bandwidth partitioned between other masters Hard regulation can set a maximum bandwidth for a batch processing master Batch processing Batch processing bandwidth increases to use any available bandwidth Batch processing Batch processing Batch processor bandwidth is partitioned by varying the number of outstanding transactions Stream processing 28

29 30% Browsing Boost with QoS % 130% Cortex performs >30% better 600 soaks up Mali spare system bandwidth 125% % Cortex performs Base22% Case better Performance Mali meets its target 115% Mali BW (No QoS) Reduced Mali allows Mali BWrequirement (QoS-301) Cortex-A9 to be higher Target Mali BW priority more often % 105% CPU Performance (No QoS) 350 CPU Performance (QoS-301) 300 Lower Mali requirements are exceeded. Cortex-A9 is highest priority most of the time % Targetted Mali Bandwidth (MB/s) Cortex A9 Performance Improvement Mali Bandwidth (MB/s) 650

30 Optimizing Efficiency The performance of a system depends on Maximizing the efficiency from the memory controller Using Cache to minimize the system bandwidth And reduce latency to the masters Using write buffering to minimize the latency from the system Performance is optimized by Implementing a consistent arbitration policy throughout the system Exploiting the different latency sensitivities of masters Roadmap to QoS 30 Consistent, system-wide, priority-based arbitration policy Priority controllers for the system masters Time-out mechanism in the system queue High efficiency memory controller with write buffering Regulation from QoS-301 NIC-301 Mali-VE6 Memory Ctrl Mali-400 DMA Ctrl LCD Ctrl Peripheral Cortex-A9 Peripheral

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

FPGA Adaptive Software Debug and Performance Analysis

FPGA Adaptive Software Debug and Performance Analysis white paper Intel Adaptive Software Debug and Performance Analysis Authors Javier Orensanz Director of Product Management, System Design Division ARM Stefano Zammattio Product Manager Intel Corporation

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013 Getting the Most out of Advanced ARM IP ARM Technology Symposia November 2013 Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block are now Sub-Systems Cortex

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore

More information

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Yafit Snir Arindam Guha, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Agenda Overview: MIPI Verification approaches and challenges Acceleration methodology overview and

More information

Combining Arm & RISC-V in Heterogeneous Designs

Combining Arm & RISC-V in Heterogeneous Designs Combining Arm & RISC-V in Heterogeneous Designs Gajinder Panesar, CTO, UltraSoC gajinder.panesar@ultrasoc.com RISC-V Summit 3 5 December 2018 Santa Clara, USA Problem statement Deterministic multi-core

More information

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software

More information

Validation Strategies with pre-silicon platforms

Validation Strategies with pre-silicon platforms Validation Strategies with pre-silicon platforms Shantanu Ganguly Synopsys Inc April 10 2014 2014 Synopsys. All rights reserved. 1 Agenda Market Trends Emulation HW Considerations Emulation Scenarios Debug

More information

RM4 - Cortex-M7 implementation

RM4 - Cortex-M7 implementation Formation Cortex-M7 implementation: This course covers the Cortex-M7 V7E-M compliant CPU - Processeurs ARM: ARM Cores RM4 - Cortex-M7 implementation This course covers the Cortex-M7 V7E-M compliant CPU

More information

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA) Bus AMBA Advanced Microcontroller Bus Architecture (AMBA) Rene.beuchat@epfl.ch Rene.beuchat@hesge.ch Réf: AMBA Specification (Rev 2.0) www.arm.com ARM IHI 0011A 1 What to see AMBA system architecture Derivatives

More information

RM3 - Cortex-M4 / Cortex-M4F implementation

RM3 - Cortex-M4 / Cortex-M4F implementation Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

The CoreConnect Bus Architecture

The CoreConnect Bus Architecture The CoreConnect Bus Architecture Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formerly attached

More information

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology SoC Design Lecture 11: SoC Bus Architectures Shaahin Hessabi Department of Computer Engineering Sharif University of Technology On-Chip bus topologies Shared bus: Several masters and slaves connected to

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface

Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface Thierry Berdah, Yafit Snir Next Generation Verification Process for Automotive and Mobile Designs with MIPI CSI-2 SM Interface Agenda Typical Verification Challenges of MIPI CSI-2 SM designs IP, Sub System

More information

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics Veloce2 the Enterprise Verification Platform Simon Chen Emulation Business Development Director Mentor Graphics Agenda Emulation Use Modes Veloce Overview ARM case study Conclusion 2 Veloce Emulation Use

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

ARM s IP and OSCI TLM 2.0

ARM s IP and OSCI TLM 2.0 ARM s IP and OSCI TLM 2.0 Deploying Implementations of IP at the Programmer s View abstraction level via RealView System Generator ESL Marketing and Engineering System Design Division ARM Q108 1 Contents

More information

Chapter 2 The AMBA SOC Platform

Chapter 2 The AMBA SOC Platform Chapter 2 The AMBA SOC Platform SoCs contain numerous IPs that provide varying functionalities. The interconnection of IPs is non-trivial because different SoCs may contain the same set of IPs but have

More information

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits NOC INTERCONNECT IMPROVES SOC ECONO CONOMICS Initial Investment is Low Compared to SoC Performance and Cost Benefits A s systems on chip (SoCs) have interconnect, along with its configuration, verification,

More information

ARM Multimedia IP: working together to drive down system power and bandwidth

ARM Multimedia IP: working together to drive down system power and bandwidth ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!

More information

Managing Complex Trace Filtering and Triggering Capabilities of CoreSight. Jens Braunes pls Development Tools

Managing Complex Trace Filtering and Triggering Capabilities of CoreSight. Jens Braunes pls Development Tools Managing Complex Trace Filtering and Triggering Capabilities of CoreSight Jens Braunes pls Development Tools Outline 2 Benefits and challenges of on-chip trace The evolution of embedded systems and the

More information

Designing with ALTERA SoC Hardware

Designing with ALTERA SoC Hardware Designing with ALTERA SoC Hardware Course Description This course provides all theoretical and practical know-how to design ALTERA SoC devices under Quartus II software. The course combines 60% theory

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators

A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators A framework for optimizing OpenVX Applications on Embedded Many Core Accelerators Giuseppe Tagliavini, DEI University of Bologna Germain Haugou, IIS ETHZ Andrea Marongiu, DEI University of Bologna & IIS

More information

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc. On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques

More information

Designing with NXP i.mx8m SoC

Designing with NXP i.mx8m SoC Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing

More information

Test and Verification Solutions. ARM Based SOC Design and Verification

Test and Verification Solutions. ARM Based SOC Design and Verification Test and Verification Solutions ARM Based SOC Design and Verification 7 July 2008 1 7 July 2008 14 March 2 Agenda System Verification Challenges ARM SoC DV Methodology ARM SoC Test bench Construction Conclusion

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

CoreTile Express for Cortex-A5

CoreTile Express for Cortex-A5 CoreTile Express for Cortex-A5 For the Versatile Express Family The Versatile Express family development boards provide an excellent environment for prototyping the next generation of system-on-chip designs.

More information

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32

More information

Zynq Architecture, PS (ARM) and PL

Zynq Architecture, PS (ARM) and PL , PS (ARM) and PL Joint ICTP-IAEA School on Hybrid Reconfigurable Devices for Scientific Instrumentation Trieste, 1-5 June 2015 Fernando Rincón Fernando.rincon@uclm.es 1 Contents Zynq All Programmable

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Product Series SoC Solutions Product Series 2016

Product Series SoC Solutions Product Series 2016 Product Series Why SPI? or We will discuss why Serial Flash chips are used in many products. What are the advantages and some of the disadvantages. We will explore how SoC Solutions SPI and QSPI IP Cores

More information

It's not about the core, it s about the system

It's not about the core, it s about the system It's not about the core, it s about the system Gajinder Panesar, CTO, UltraSoC gajinder.panesar@ultrasoc.com RISC-V Workshop 18 19 July 2018 Chennai, India Overview Architecture overview Example Scenarios

More information

NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions

NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions Abstract The increase of SoC complexity with more cores, IPs and other subsystems has led SoC architects

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

Achieving UFS Host Throughput For System Performance

Achieving UFS Host Throughput For System Performance Achieving UFS Host Throughput For System Performance Yifei-Liu CAE Manager, Synopsys Mobile Forum 2013 Copyright 2013 Synopsys Agenda UFS Throughput Considerations to Meet Performance Objectives UFS Host

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

AN4777 Application note

AN4777 Application note Application note Implications of memory interface configurations on low-power STM32 microcontrollers Introduction The low-power STM32 microcontrollers have a rich variety of configuration options regarding

More information

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02 Fujitsu System Applications Support 1 Overview System Applications Support SOC Application Development Lab Multimedia VoIP Wireless Bluetooth Processors, DSP and Peripherals ARM Reference Platform 2 SOC

More information

AMBA Protocol for ALU

AMBA Protocol for ALU International Journal of Emerging Engineering Research and Technology Volume 2, Issue 5, August 2014, PP 51-59 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) AMBA Protocol for ALU K Swetha Student, Dept

More information

Introduction to gem5. Nizamudheen Ahmed Texas Instruments

Introduction to gem5. Nizamudheen Ahmed Texas Instruments Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1 Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related

More information

TRACE32. Product Overview

TRACE32. Product Overview TRACE32 Product Overview Preprocessor Product Portfolio Lauterbach is the world s leading manufacturer of complete, modular microprocessor development tools with 35 years experience in the field of embedded

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Strato and Strato OS. Justin Zhang Senior Applications Engineering Manager. Your new weapon for verification challenge. Nov 2017

Strato and Strato OS. Justin Zhang Senior Applications Engineering Manager. Your new weapon for verification challenge. Nov 2017 Strato and Strato OS Your new weapon for verification challenge Justin Zhang Senior Applications Engineering Manager Nov 2017 Emulation Market Evolution Emulation moved to Virtualization with Veloce2 Data

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

FPGA Entering the Era of the All Programmable SoC

FPGA Entering the Era of the All Programmable SoC FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip

More information

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla. HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using

More information

A Next Generation Home Access Point and Router

A Next Generation Home Access Point and Router A Next Generation Home Access Point and Router Product Marketing Manager Network Communication Technology and Application of the New Generation Points of Discussion Why Do We Need a Next Gen Home Router?

More information

3D Graphics in Future Mobile Devices. Steve Steele, ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM 3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and

More information

KeyStone C665x Multicore SoC

KeyStone C665x Multicore SoC KeyStone Multicore SoC Architecture KeyStone C6655/57: Device Features C66x C6655: One C66x DSP Core at 1.0 or 1.25 GHz C6657: Two C66x DSP Cores at 0.85, 1.0, or 1.25 GHz Fixed and Floating Point Operations

More information

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems Verification Futures 2016 Nick Heaton, Distinguished Engineer, Cadence Systems Agenda Update on Challenges presented in 2015, namely Scalability of the verification engines The rise of Use-Case Driven

More information

Analyze system performance using IWB. Interconnect Workbench Dave Huang

Analyze system performance using IWB. Interconnect Workbench Dave Huang Analyze system performance using IWB Interconnect Workbench Dave Huang Perf_analysis@126.com 1 Information Personal peech of personal experience I am on behalf on myself Interconnects Are at the Heart

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges

More information

ARM Debug and Trace. Configuration and Usage Models. Document number: ARM DEN 0034A Copyright ARM Limited

ARM Debug and Trace. Configuration and Usage Models. Document number: ARM DEN 0034A Copyright ARM Limited ARM Debug and Trace Configuration and Usage Models Document number: ARM DEN 0034A Copyright ARM Limited 2012-2013 ARM Debug and Trace Configuration and Usage Models Release information The following table

More information

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre The Evolution of the ARM Architecture Towards Big Data and the Data-Centre 8th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'13) held in conjunction with SC 13, Denver, Colorado

More information

Chapter 15 ARM Architecture, Programming and Development Tools

Chapter 15 ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 07 ARM Cortex CPU and Microcontrollers 2 Microcontroller CORTEX M3 Core 32-bit RALU, single cycle MUL, 2-12 divide, ETM interface,

More information

VLSI Design of Multichannel AMBA AHB

VLSI Design of Multichannel AMBA AHB RESEARCH ARTICLE OPEN ACCESS VLSI Design of Multichannel AMBA AHB Shraddha Divekar,Archana Tiwari M-Tech, Department Of Electronics, Assistant professor, Department Of Electronics RKNEC Nagpur,RKNEC Nagpur

More information

Growth outside Cell Phone Applications

Growth outside Cell Phone Applications ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

AHB monitor. Monitor. AHB bridge. Expansion AHB ports M1, M2, and S. AHB bridge. AHB bridge. Configuration. Smart card reader SSP (PL022)

AHB monitor. Monitor. AHB bridge. Expansion AHB ports M1, M2, and S. AHB bridge. AHB bridge. Configuration. Smart card reader SSP (PL022) The ARM RealView Versatile family of development boards provide a feature rich prototyping system for system-on-chip designs. This family includes the first development board to support both the ARM926EJ-S

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

SQLoC: Using SQL database for performance analysis of an ARM v8 SoC

SQLoC: Using SQL database for performance analysis of an ARM v8 SoC SQLoC: Using SQL database for performance analysis of an ARM v8 SoC Gordon Allan and Avidan Efody Mentor Graphics Agenda The performance analysis problem Run time & design time configuration Use cases

More information

Place Your Logo Here. K. Charles Janac

Place Your Logo Here. K. Charles Janac Place Your Logo Here K. Charles Janac President and CEO Arteris is the Leading Network on Chip IP Provider Multiple Traffic Classes Low Low cost cost Control Control CPU DSP DMA Multiple Interconnect Types

More information

ARM Connected Community Technical Symposium Reaching High Performance System Design Using AMBA Fabric IP

ARM Connected Community Technical Symposium Reaching High Performance System Design Using AMBA Fabric IP ARM Connected Community Technical Symposium Reaching High Performance System Design Using AMBA Fabric IP Tim Mace Senior Technical Marketing Manager Fabric IP BU, ARM 1 What is Fabric IP? Fabric IP is:

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Power Aware Architecture Design for Multicore SoCs

Power Aware Architecture Design for Multicore SoCs Power Aware Architecture Design for Multicore SoCs EDPS Monterey Patrick Sheridan Synopsys Virtual Prototyping April 2015 Low Power SoC Design Multi-disciplinary system problem Must manage energy consumption

More information

Mobile & IoT Market Trends and Memory Requirements

Mobile & IoT Market Trends and Memory Requirements Mobile & IoT Market Trends and Memory Requirements JEDEC Mobile & IOT Forum Daniel Heo ARM Segment Marketing Copyright ARM 2016 Outline Wearable & IoT Market Opportunities Challenges in Wearables & IoT

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

Mapping applications into MPSoC

Mapping applications into MPSoC Mapping applications into MPSoC concurrency & communication Jos van Eijndhoven jos@vectorfabrics.com March 12, 2011 MPSoC mapping: exploiting concurrency 2 March 12, 2012 Computation on general purpose

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Evolving IP configurability and the need for intelligent IP configuration

Evolving IP configurability and the need for intelligent IP configuration Evolving IP configurability and the need for intelligent IP configuration Mayank Sharma Product Manager ARM Tech Symposia India December 7 th 2016 Increasing IP integration costs per node $140 $120 $M

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

An Efficient AXI Read and Write Channel for Memory Interface in System-on-Chip

An Efficient AXI Read and Write Channel for Memory Interface in System-on-Chip An Efficient AXI Read and Write Channel for Memory Interface in System-on-Chip Abhinav Tiwari M. Tech. Scholar, Embedded System and VLSI Design Acropolis Institute of Technology and Research, Indore (India)

More information

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus

Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Asynchronous on-chip Communication: Explorations on the Intel PXA27x Peripheral Bus Andrew M. Scott, Mark E. Schuelein, Marly Roncken, Jin-Jer Hwan John Bainbridge, John R. Mawer, David L. Jackson, Andrew

More information