Unleash the DSP performance of Arm Cortex processors

Similar documents
The Changing Face of Edge Compute

2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee

How to Build Optimized ML Applications with Arm Software

A Developer's Guide to Security on Cortex-M based MCUs

Bringing Intelligence to Enterprise Storage Drives

How to Build Optimized ML Applications with Arm Software

WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

Accelerating intelligence at the edge for embedded and IoT applications

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Arm s Latest CPU for Laptop-Class Performance

Advanced IP solutions enabling the autonomous driving revolution

Beyond TrustZone PSA Reed Hinkel Senior Manager Embedded Security Market Development

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

Each Milliwatt Matters

Diversity of. connectivity required for scalable IoT devices. Sam Grove Principal Software Engineer Arm. Arm TechCon 2017.

Implementing debug. and trace access. through functional I/O. Alvin Yang Staff FAE. Arm Tech Symposia Arm Limited

Software Ecosystem for Arm-based HPC

Optimize HPC - Application Efficiency on Many Core Systems

New Approaches to Connected Device Security

A New Security Platform for High Performance Client SoCs

Arm TrustZone Armv8-M Primer

Using Virtual Platforms To Improve Software Verification and Validation Efficiency

Machine learning for the Internet of Things

Connect your IoT device: Bluetooth 5, , NB-IoT

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

ARM instruction sets and CPUs for wide-ranging applications

Arm crossplatform. VI-HPS platform October 16, Arm Limited

Connect Your IoT Device: Bluetooth 5, , NB-IoT

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Building Ultra-Low Power Wearable SoCs

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

DPDK on Arm64 Status Review & Plan

Accelerate Ceph By SPDK on AArch64

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

TZMP-1 Software Reference Implementation. Ken Liu 2018-Mar-12

ARM Performance Libraries Current and future interests

Bringing Intelligence to Enterprise Storage Drives

Beyond TrustZone Part 1 - PSA

Compute solutions for mass deployment of autonomy

Hardware- Software Co-design at Arm GPUs

Enable AI on Mobile Devices

Improve the container image compatibility on Arm

Arm Mbed Edge. Nick Zhou Senior Technical Account Manager. Arm Tech Symposia Arm Limited

Beyond TrustZone Security Enclaves Reed Hinkel Senior Manager Embedded Security Market Develop

DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks

Arm Mbed Edge. Shiv Ramamurthi Arm. Arm Tech Symposia Arm Limited

DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks

Making progress vs strategy

An introduction to Machine Learning silicon

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

The Next Steps in the Evolution of ARM Cortex-M

ARM processors driving automotive innovation

Next Generation Visual Computing

Bringing the benefits of Cortex-M processors to FPGA

Building mbed Together: An Overview of mbed OS and How To Get Involved

ARM mbed mbed OS mbed Cloud

SBC-S32V234 QUICK START GUIDE (QSG)

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Growth outside Cell Phone Applications

The Next Steps in the Evolution of Embedded Processors

Accelerating IoT with ARM mbed

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

CCIX: a new coherent multichip interconnect for accelerated use cases

ARM TrustZone for ARMv8-M for software engineers

Beyond TrustZone PSA. Rob Coombs Security Director. Part1 - PSA Tech Seminars Arm Limited

Dialog Semiconductor. Capital Markets Day 16 September 2015, London. connected

What is gem5 and where do I get it?

Design Process. in an embedded system. Kasper Ornstein Mecklenburg SW/HW development engineer Arm Limited

Beyond Hardware IP An overview of Arm development solutions

Addressing 7nm Arm DynamIQ Cluster Design Challenges Using the Cadence Digital Implementation Flow

Standard Cell Design and Optimization Methodology for ASAP7 PDK

Accelerating IoT with ARM mbed

Confessions of a security hardware driver maintainer

mbed OS Update Sam Grove Technical Lead, mbed OS June 2017 ARM 2017

ARM Cortex -M and Java in the Internet of Things. Asim Chaudhry Field Applications Engineer, ARM

Amber Baruffa Vincent Varouh

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju

GSMA Embedded SIM 9 th December Accelerating growth and operational efficiency in the M2M world

The Benefits of GPU Compute on ARM Mali GPUs

ARM mbed Technical Overview

Accelerating IoT with ARM mbed

UAE PUBLIC TRAINING CALENDAR

Developing the Bifrost GPU architecture for mainstream graphics

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

Chapter 5. Introduction ARM Cortex series

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Annex A to the MPEG Audio Patent License Agreement Essential Philips, France Telecom and IRT Patents relevant to DVD-Video Disc - MPEG Audio - general

software.sci.utah.edu (Select Visitors)

ARM mbed Technical Overview

So you think developing an SoC needs to be complex or expensive? Think again

Polycom Advantage Service Endpoint Utilization Report

VESA Display Standards Updates

Polycom Advantage Service Endpoint Utilization Report

ARM mbed Towards Secure, Scalable, Efficient IoT of Scale

Automated AMI Model Generation & Validation

Decision Making Information from Your Mobile Device with Today's Rockwell Software

e-sens Nordic & Baltic Area Meeting Stockholm April 23rd 2013

Rendering Structures Analyzing modern rendering on mobile

NCC Cable System Order

Transcription:

Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager

Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology for DSP applications 2 Selecting the right Cortex processor for your algorithms 3 Understanding NEON acceleration for actual and emerging use cases 4 Benefiting from a wide ecosystem for all Cortex processors 2

The most widely deployed processing platform Gaining traction in DSP applications 3

Increasing DSP performance Addressing a wide range of performance points SVE NEON NEON Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-M Designed for discrete processing and microcontrollers Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-R Designed for high performance, hard real-time applications Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-A Designed for high-level operating systems 4

Pick the right CPU for your DSP algorithm Example use case Dolby Digital+ Dolby Audio Processing AC4 Often run on dedicated DSPs Benefits of CPU with DSP capabilities Reduced software development costs What if you could run it on a CPU? Simplified toolchain Reduced system-level complexity Development and BoM cost savings 5

Required MHz Select the most efficient CPU for your DSP use case Dolby Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 200 150 100 50 Dolby Digital Plus (required MHz, lower is better) Cortex-M7 @25MHz 0 5.1-ch single decode 5.1-ch single decode to 2-ch downmix 7.1-ch single decode 5.1-ch dual decode 7.1-ch main + 5.1-ch assoc dual decode Cortex-A57 Cortex-A53 Cortex-M7 6

MHz (Lower is better) Run the latest advanced audio codec on your CPU Dolby performance requirements Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz 250 200 150 100 50 Dolby AC-4 performance requirements on Arm V8-A processors 0 Main decoding w/o rand memory Cortex-A53 Associated audio decoding w/o rand memory Cortex-A57 7

Arm CPUs can handle demanding DSP workloads Increasing performance requirements Cortex-A + NEON Cortex-M + DSP extension Dolby Digital Plus Dolby Audio Processing Dolby AC4 Address new markets and applications Simpler systems and faster time to market Innovation through collaboration 8

Extending NEON computing to new use cases Armv7-A/R NEON including: 32x64-bit register 8-bit to 64-bit integer support FP32 support Armv8.0-A NEON including: AArch32 and AArch64 Optional cryptography 32x128-bit register in AArch64 FP64 support Armv8.2-A NEON including: FP16 support 8-bit dot product instructions 9

Required MHz (lower is better) The right processors for your DSP application Multimedia 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 1.5 1.3 1 0.9 0.6 0.5 0 FFMPEG (relative to Cortex-A53, lower is better) Cortex-A7 Cortex-A35 Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75 10

Time (lower is better) MAC/cycle Enhanced architecture for emerging use cases Computer Vision Machine Learning FP16 FP16 1.2 1 1.00 0.90 0.8 0.70 0.59 0.6 0.44 0.4 0.35 0.2 0 Harris Corners (relative to Cortex-A53) Cortex-A53 (FP32) Cortex-A55 (FP32) Cortex-A55(FP16) Cortex-A73 (FP32) Cortex-A75(FP32) Cortex-A75(FP16) 6 5 4 3 2 1 0 1 1.2 Cortex-A53 (FP32) Cortex-A55 (FP16) 2.5 General Matrix Multiply 8-bit dot product 5.5 Cortex-A55 (FP32) Cortex-A55 (8-bit) 11

A versatile DSP ecosystem for NEON Open-source ecosystem Commercial solutions 12

Relative DSP/MHz performance to Cortex-M4 Pick the right Cortex-M for your DSP algorithm 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 CFFT Q31 RFFT Q31 CFFT F32 RFFT F32 FIR Q31 FIR F32 *estimate for Cortex-M33, all based on CMSIS-DSP library Cortex-M7 total DSP performance ~2x of Cortex-M4 (due to higher max frequency) Cortex-M7 High DSP performance, SP + DP FPU TrustZone Cortex-M33 Security in DSP applications, co-processor IF Cortex-M4 Mainstream applications 13

A versatile DSP ecosystem for Cortex-M Fundamental DSP Functions on Cortex-M available for free! Examples of ecosystem solutions and partners Audio codecs CMSIS-DSP library Filters Controller functions Basic math functions Statistical functions Interpolator functions Matrix functions Support functions Complex math functions Fast math functions Transforms Voice codecs Image processing Keyword spotting Audio enhancement Sensor fusion Motor control Connectivity Simulation tools 14

Use your CPU to unleash DSP to new markets Foster innovation with partnerships in the world s #1 ecosystem Standardized architecture, proven in many markets and DSP applications Simplifies software portability across different device solutions Largest ecosystem of silicon vendors, compilers, tools, libraries and software Save development and BOM cost by using a homogeneous system Silicon vendors Compiler & tools Developer Software Operating system 15

Summary Cortex processors address a wide range of DSP performance points From high-end Cortex-A to efficient Cortex-M Comprehensive ecosystem and library support for DSP application Simplifies and accelerates new use cases and applications Continued investment in SIMD capabilities Strong roadmap for demanding future applications 16

Thank You 17

Section Divider Slide

Section Divider Slide

Section Divider Slide

Section Divider Slide

25

One Column Content We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarchy 26

Two-up Slide Write in your subtitle here Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 27

Three-up Slide Write in your subtitle here Column Headline Column Headline Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. We utilize bullet level one as plain text because it s meant to be written in paragraph form. We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 28

Narrow Column and Content Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. 29 Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch

Narrow Column and Content Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 30

Three Column Slide With Image Placeholders Column Headline Column Headline Column Headline 31

Two Columns with Images Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 32

Title, Subtitle and Content Slide Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 33

Bar Chart Example 34

Three-up Chart Slide Write in your subtitle here Chart Headline Chart Headline Chart Headline 35

Bar Chart Example 36

Three-up Chart Slide Write in your subtitle here Chart Headline Chart Headline Chart Headline 37

Release Schedule We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 2017 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Product 1980 Product Product 17wk10, w/e 10 Mar 17wk241980 w/e 16 Jun 17wk38, 1980 w/e 22 Sep 17wk501980, w/e 15 Dec MWC 27 Feb 2 Mar Embedded World 14-16 Mar Computex 30 May 3 Jun APM 8 10 Aug TechCon 24-26 Oct Product CES 9-12 Jan 38

Color Palatte RGB: 255, 107, 0 RGB: 255, 199, 0 RGB: 149, 214, 0 RGB: 0, 193, 222 RGB: 0, 145, 189 RGB: 0, 43, 73 RGB: 51, 62, 72 RGB: 125, 134, 140 RGB: 229, 2364, 235 39

40

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks 41

Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 감사합니다 धन यव द 42