Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager
Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology for DSP applications 2 Selecting the right Cortex processor for your algorithms 3 Understanding NEON acceleration for actual and emerging use cases 4 Benefiting from a wide ecosystem for all Cortex processors 2
The most widely deployed processing platform Gaining traction in DSP applications 3
Increasing DSP performance Addressing a wide range of performance points SVE NEON NEON Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-M Designed for discrete processing and microcontrollers Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-R Designed for high performance, hard real-time applications Optimized DSP extensions (8-bit, 16-bit SIMD capability) Cortex-A Designed for high-level operating systems 4
Pick the right CPU for your DSP algorithm Example use case Dolby Digital+ Dolby Audio Processing AC4 Often run on dedicated DSPs Benefits of CPU with DSP capabilities Reduced software development costs What if you could run it on a CPU? Simplified toolchain Reduced system-level complexity Development and BoM cost savings 5
Required MHz Select the most efficient CPU for your DSP use case Dolby Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 200 150 100 50 Dolby Digital Plus (required MHz, lower is better) Cortex-M7 @25MHz 0 5.1-ch single decode 5.1-ch single decode to 2-ch downmix 7.1-ch single decode 5.1-ch dual decode 7.1-ch main + 5.1-ch assoc dual decode Cortex-A57 Cortex-A53 Cortex-M7 6
MHz (Lower is better) Run the latest advanced audio codec on your CPU Dolby performance requirements Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz 250 200 150 100 50 Dolby AC-4 performance requirements on Arm V8-A processors 0 Main decoding w/o rand memory Cortex-A53 Associated audio decoding w/o rand memory Cortex-A57 7
Arm CPUs can handle demanding DSP workloads Increasing performance requirements Cortex-A + NEON Cortex-M + DSP extension Dolby Digital Plus Dolby Audio Processing Dolby AC4 Address new markets and applications Simpler systems and faster time to market Innovation through collaboration 8
Extending NEON computing to new use cases Armv7-A/R NEON including: 32x64-bit register 8-bit to 64-bit integer support FP32 support Armv8.0-A NEON including: AArch32 and AArch64 Optional cryptography 32x128-bit register in AArch64 FP64 support Armv8.2-A NEON including: FP16 support 8-bit dot product instructions 9
Required MHz (lower is better) The right processors for your DSP application Multimedia 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 1.5 1.3 1 0.9 0.6 0.5 0 FFMPEG (relative to Cortex-A53, lower is better) Cortex-A7 Cortex-A35 Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75 10
Time (lower is better) MAC/cycle Enhanced architecture for emerging use cases Computer Vision Machine Learning FP16 FP16 1.2 1 1.00 0.90 0.8 0.70 0.59 0.6 0.44 0.4 0.35 0.2 0 Harris Corners (relative to Cortex-A53) Cortex-A53 (FP32) Cortex-A55 (FP32) Cortex-A55(FP16) Cortex-A73 (FP32) Cortex-A75(FP32) Cortex-A75(FP16) 6 5 4 3 2 1 0 1 1.2 Cortex-A53 (FP32) Cortex-A55 (FP16) 2.5 General Matrix Multiply 8-bit dot product 5.5 Cortex-A55 (FP32) Cortex-A55 (8-bit) 11
A versatile DSP ecosystem for NEON Open-source ecosystem Commercial solutions 12
Relative DSP/MHz performance to Cortex-M4 Pick the right Cortex-M for your DSP algorithm 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 CFFT Q31 RFFT Q31 CFFT F32 RFFT F32 FIR Q31 FIR F32 *estimate for Cortex-M33, all based on CMSIS-DSP library Cortex-M7 total DSP performance ~2x of Cortex-M4 (due to higher max frequency) Cortex-M7 High DSP performance, SP + DP FPU TrustZone Cortex-M33 Security in DSP applications, co-processor IF Cortex-M4 Mainstream applications 13
A versatile DSP ecosystem for Cortex-M Fundamental DSP Functions on Cortex-M available for free! Examples of ecosystem solutions and partners Audio codecs CMSIS-DSP library Filters Controller functions Basic math functions Statistical functions Interpolator functions Matrix functions Support functions Complex math functions Fast math functions Transforms Voice codecs Image processing Keyword spotting Audio enhancement Sensor fusion Motor control Connectivity Simulation tools 14
Use your CPU to unleash DSP to new markets Foster innovation with partnerships in the world s #1 ecosystem Standardized architecture, proven in many markets and DSP applications Simplifies software portability across different device solutions Largest ecosystem of silicon vendors, compilers, tools, libraries and software Save development and BOM cost by using a homogeneous system Silicon vendors Compiler & tools Developer Software Operating system 15
Summary Cortex processors address a wide range of DSP performance points From high-end Cortex-A to efficient Cortex-M Comprehensive ecosystem and library support for DSP application Simplifies and accelerates new use cases and applications Continued investment in SIMD capabilities Strong roadmap for demanding future applications 16
Thank You 17
Section Divider Slide
Section Divider Slide
Section Divider Slide
Section Divider Slide
25
One Column Content We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarchy 26
Two-up Slide Write in your subtitle here Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 27
Three-up Slide Write in your subtitle here Column Headline Column Headline Column Headline We utilize bullet level one as plain text because it s meant to be written in paragraph form. We utilize bullet level one as plain text because it s meant to be written in paragraph form. We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 28
Narrow Column and Content Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. 29 Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch
Narrow Column and Content Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 30
Three Column Slide With Image Placeholders Column Headline Column Headline Column Headline 31
Two Columns with Images Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 32
Title, Subtitle and Content Slide Write in your subtitle here We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 33
Bar Chart Example 34
Three-up Chart Slide Write in your subtitle here Chart Headline Chart Headline Chart Headline 35
Bar Chart Example 36
Three-up Chart Slide Write in your subtitle here Chart Headline Chart Headline Chart Headline 37
Release Schedule We utilize bullet level one as plain text because it s meant to be written in paragraph form. Here we insert our first bullet Try to keep bullets short and to a minimum Next bullet level is slightly smaller for hierarch 2017 2018 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Product 1980 Product Product 17wk10, w/e 10 Mar 17wk241980 w/e 16 Jun 17wk38, 1980 w/e 22 Sep 17wk501980, w/e 15 Dec MWC 27 Feb 2 Mar Embedded World 14-16 Mar Computex 30 May 3 Jun APM 8 10 Aug TechCon 24-26 Oct Product CES 9-12 Jan 38
Color Palatte RGB: 255, 107, 0 RGB: 255, 199, 0 RGB: 149, 214, 0 RGB: 0, 193, 222 RGB: 0, 145, 189 RGB: 0, 43, 73 RGB: 51, 62, 72 RGB: 125, 134, 140 RGB: 229, 2364, 235 39
40
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks 41
Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 감사합니다 धन यव द 42