ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

Similar documents
Supercharging the Embedded Device: ARM Cortex -M7. Ian Johnson Senior Product Manager, ARM

ARM Cortex -M and Java in the Internet of Things. Asim Chaudhry Field Applications Engineer, ARM

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Growth outside Cell Phone Applications

New ARMv8-R technology for real-time control in safetyrelated

The Next Steps in the Evolution of ARM Cortex-M

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Building Ultra-Low Power Wearable SoCs

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Bringing the benefits of Cortex-M processors to FPGA

Chapter 15 ARM Architecture, Programming and Development Tools

The Next Steps in the Evolution of Embedded Processors

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development

STM32 F0 Value Line. Entry-level MCUs

Each Milliwatt Matters

Chapter 5. Introduction ARM Cortex series

ARM processors driving automotive innovation

ARM instruction sets and CPUs for wide-ranging applications

Copyright 2016 Xilinx

ARM Processors for Embedded Applications

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

STM32F3. Cuauhtémoc Carbajal ITESM CEM 12/08/2013

So you think developing an SoC needs to be complex or expensive? Think again

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

Kinetis KE1xF512 MCUs

RM4 - Cortex-M7 implementation

Interconnects, Memory, GPIO

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

DesignWare IP for IoT SoC Designs

A Developer's Guide to Security on Cortex-M based MCUs

STM32F429 Overview. Steve Miller STMicroelectronics, MMS Applications Team October 26 th 2015

ARM TrustZone for ARMv8-M for software engineers

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

Zynq-7000 All Programmable SoC Product Overview

Introduction CHAPTER IN THIS CHAPTER

Mobile & IoT Market Trends and Memory Requirements

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

T he key to building a presence in a new market

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

Kinetis Software Optimization

Implementing Secure Software Systems on ARMv8-M Microcontrollers

ELC4438: Embedded System Design ARM Embedded Processor

Low-Power Processor Solutions for Always-on Devices

SOMNIUM DRT Benchmarks Whitepaper DRT v3.4 release : August 2016

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

Key Benefits. SAM S70 and E70 Devices

Universität Dortmund. ARM Architecture

Mobile & IoT Market Trends and Memory Requirements

RM3 - Cortex-M4 / Cortex-M4F implementation

Designing Security & Trust into Connected Devices

ARM Cortex -M for Beginners

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Designing Security & Trust into Connected Devices

IoT and Security: ARM v8-m Architecture. Robert Boys Product Marketing DSG, ARM. Spring 2017: V 3.1

RISC-V Core IP Products

突破 8-/16-/32- 位和 DSP 界限的 ARM MCU 解决方案

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Kinetis KV5x Real-Time Control MCUs with Ethernet Up to 1 MB Flash and 256 KB SRAM

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

How to Select Hardware forvolume IoT Deployment?

恩智浦 LPC 系列 MCU 全方位支持嵌入式和物联网的应用开发

Mobile & IoT Market Trends and Memory Requirements

STM32 Cortex-M3 STM32F STM32L STM32W

The ARM Cortex-A9 Processors

AN4777 Application note

Embedded segment market update

Cortex-M3/M4 Software Development

3 2-bit ARM Cortex TM -M3 based

The Nios II Family of Configurable Soft-core Processors

Five Ways to Build Flexibility into Industrial Applications with FPGAs

Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory.

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller.

S32K Microcontroller Press Pack

STM32 Journal. In this Issue:

mbed OS Update Sam Grove Technical Lead, mbed OS June 2017 ARM 2017

ARM mbed mbed OS mbed Cloud

Using the i.mx RT FlexRAM

Unleash the DSP performance of Arm Cortex processors

Cortex-M Processors and the Internet of Things (IoT)

Multicore and MIPS: Creating the next generation of SoCs. Jim Whittaker EVP MIPS Business Unit

Modular ARM System Design

FOR IOT PRODUCT DEVELOPMENT

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public

Designing Embedded Processors in FPGAs

ECE 571 Advanced Microprocessor-Based Design Lecture 22

Cypress PSoC 6 Microcontrollers

Put on the Future: Wearable Technology

EEMBC s Automotive/Industrial Microprocessor Benchmarks. June 4, 2004

Measuring Interrupt Latency

L2 - C language for Embedded MCUs

AN4749 Application note

Advanced IP solutions enabling the autonomous driving revolution

A backward glance and a forward view

2-bit ARM Cortex TM -M3 based Microcontroller FM3 Family MB9A130 Series

Effective System Design with ARM System IP

The Changing Face of Edge Compute

Transcription:

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series Ian Johnson Senior Product Manager, ARM 1

ARM Cortex Processors across the Embedded Market Cortex -M processors Cortex -R processors Cortex -A processors MCU + DSP RTOS Rich OS Smallest footprint / lowest power Highest performance / real-time Highest performance 2

Taking the Cortex-M Series to the Next Level Lowest Area Highest Energy Efficiency Energy- Performance Balance Blended MCU and Digital Signal Processing Highest Performance 3

Cortex-M7 Overview Performance Achieving 5 CoreMark/MHz 2000 CoreMark * in 40LP Typical 2x DSP performance of Cortex-M4 Versatility Highly flexible system and memory interfaces Designed for functional safety implementations Scalability and compatibility Enables simple migration from any Cortex-M processor Widest third-party tools, RTOS, middleware support * CoreMark 1.0 : IAR Embedded Workbench v7.30.1 --endian=little --cpu=cortex-m7 -e -Ohs --use_c++_inline --no_size_constraints / Code in TCM - Data in TCM 4

Cortex-M7 Key Features (1) High-performance processor with DSP capabilities Six-stage superscalar pipeline Powerful DSP instructions and SP/DP Floating Point Best-in-class core for high-end MCU or replace MCU+DSP with Cortex-M7 Flexible, memory system Tightly-coupled memories for real-time determinism 64-bit AXI AMBA4 memory interface with I-cache and D-cache for efficient access to external resources Build powerful MCU with more memories and powerful peripherals 5

Cortex-M7 Key Features (2) ARMv7-M architecture 100% binary forwards compatibility from Cortex-M4 Key Cortex-M family processor characteristics of ease-of-use and excellent interrupt latency Reuse code and system design from existing products Safety features Memory ECC (SEC-DED), MPU, MBIST, lock-step operation, full data trace, safety manual Enables entry into safety-critical markets. 6

Cortex-M7 Block Diagram Nested Vectored Interrupt Controller (NVIC) 1 to 240 interrupts + NMI Floating Point Unit (optional) Single and double precision 32-bit AHB slave Debug access to complete memory map 64-bit Instruction TCM (optional) SRAM/ Accelerated Flash 2x32-bit Data TCM Fast on-chip SRAM 32-bit AHB slave interface DMA Engine access to TCM Interrupts NVIC FPU Debug I/F AHB Trace I/F ETM I TCM D TCM TCM arbiter and interface DMA AHB Slave AXI Master Processor Core EPPB I/F APB MPU AHBP AHB Ctrl I Cache Ctrl D Cache ETM (optional) Full instruction and data trace (ETMv4) 32-bit APB master CoreSight Debug Peripherals 32-bit AHB master Low latency on-chip peripherals Memory Protection Unit (optional) 8 or 16 regions Data cache (optional) Up to 64kB, WT/WB cache 64-bit AMBA4 AXI master interface Slow Flash / off-chip instruction memory / off-chip memory i.e. DDR / Slow peripherals External Memory System Instruction cache (optional) Up to 64kB, WT/WB cache 7

Tightly Coupled Memory (TCM) All TCMs: Support wait-states Can be used at boot-up time Support up to 16MB of memory Provide deterministic performance Dedicated store buffering Instruction TCM (ITCM) 64-bit interface Data TCM (DTCM) I TCM D TCM AHB Slave 2 X 32-bit interface: D0TCM and D1TCM, SSRAM protocol to enable direct integration with memories Supports dual-issue of loads when bit [2] of address is different DMA TCM arbiter and interface 8

Caches - Overview Harvard arrangement for optimum performance I-cache 2-way associative, D-cache 4-way associative, pseudo-random replacement policy I and D both optional, configurable sizes (4kB 64kB each) Extensions defined for the ARMv7-M system architecture Addition of cache maintenance operations Full support for the following attributes Write Through, no write allocate (WT) Write-back, no write allocate (WBRA) Write-back, write allocate (WBWA) Ctrl I Cache Ctrl D Cache AXI Master 9

Powerful & Scalable Instruction Set DSP (SIMD, fast MAC) Floating Point Cortex-M7 has the same powerful instruction set as Cortex-M4: General data processing I/O control tasks Advanced data processing bit field manipulations Integer MAC instructions are all single-cycle SIMD instructions can work on 8-/16-bit quantities packed into a 32-bit word Arithmetic can be signed/unsigned, saturating/non-saturating A few new FP instructions for FPv5 10

ARM Cortex-M7: Built for Performance Fast compute for demanding embedded applications Six-stage superscalar pipeline with branch prediction Single and double precision floating point unit Flexible memory system 64-bit AXI AMBA4 interconnect I-cache and D-cache for efficient memory operation Ultra-fast responsiveness for control 12 cycles interrupt latency Tightly coupled memories for real-time determinism Highest core performance combined with the efficiency of Cortex-M Higher = better Cortex-M7 Cortex-M4 Core C Core D Cortex-M7 MCU Cortex-M4 MCU MCU Core C MCU Core D Source: CoreMark.org, ARM for Cortex-M7 2 3 4 5 90nm 90nm Processor CoreMark/MHz Today s MCU total CoreMark 200 400 600 800 1000 11

EEMBC IPC Comparison Results are geo-mean of EEMBC IPC relative to baseline (quantified as 1 ) Measured on comparable memory systems (in this case, WB caches on Cortex-M7) Networking 1 1.1 1.3 Telecom 1 1.2 1.5 Consumer 1 1.2 1.6 AutoIndy (DP FP) 1 1.6 AutoIndy (Int) Cortex-M7 Cortex-R5 Cortex-M4 1 1 1.4 12

FP Benchmarking Status Cortex-M7 floating point performance relative to Cortex-R5 and Cortex-M4 processors Double Precision Data Cortex-M7 Cortex-R5 Cortex-M4 Single Precision Data Assumes all processors running at the same clock frequency Based on EEMBC FPMark benchmarks using small data-sets Performance relative to Cortex-R5 in the same system Benchmarks compiled with ARM tool-chain (v5.04)` 13 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

Cortex-M7: Competitive with Popular DSPs Essential DSP features Parallel execution of loads, stores and MAC SIMD support, single-cycle MAC Single and double precision floating point unit Minimal loop overhead (branch predictor/btac) Optimised DSP libraries Consistently good performance across key DSP functions Normalized cycles, lower = better Biquad Cascade Cortex-M7 32-bit DSP E 32-bit DSP F FIR Filter Real FFT Complex FFT 0 1 2 3 14

2x Performance Improvement over the Cortex-M4 FIR (Q7) FIR (Q31) FIR (Float) Biquad (Float stereo) Biquad (Float mono) Real FFT (Q15) Real FFT (Q31) Real FFT (Float) Complex FFT (Q15) Complex FFT (Q31) Complex FFT (Float) Measurements using the CMSIS DSP Library Available free of charge from ARM Now optimized for the Cortex-M7 0 0.5 1 1.5 2 2.5 15 Note: combines architectural improvements with expected core clock increase. The code was compiled using the ARM C Compiler (armcc) 5.04 Comparison was made on an FPGA on a Versatile Express motherboard

Cortex-M7 Replacement for MCU+DSP MCU DSP + DSP Cortex-M7 16 Trends: Convergence of MCU+DSP to DSC for cost reduction Increased processing demands Increasing consumer expectation of quality in portable devices Example applications: Multi-channel audio / Dolby Audio Advanced Motor Control Factory Automation Automotive Image processing Power conversions Cortex-M7 Advantages: High performance core with fast DSP Compatibility with existing Cortex-M4 designs Flexible memory system

Cortex-M7 Safety Features Cortex-M7 specific additions Cache ECC Dual core lock-step with delay External TCM ECC interface On-line MBIST interface ARMv7-M architecture based Memory protection unit (MPU) Exception logic These features will be included in the Cortex-M7 Safety Documentation Package: Safety Manual FMEA Report Development Interface Report 17

Cortex-M7 Target Applications High-end MCU Automotive Powerful processor for advanced audio/visual sensor hub processing Sensor Hub IoT Power-efficient local processor for IoT devices such as an edge router Industrial Control Flexible and reliable processor for industrial and motor control 18

Enabling Smarter Systems Without the Complexity 2x More performance delivering enhanced functionality More displays More motors Advanced touch sensing Multiple connectivity options Enhanced voice controls 19

Enabling More Capabilities for Feature-Rich Devices 2x More performance delivering improved flight management Cortex-M7 400 MHz Finer degree of control Accurate speed measurement Cortex-M4 168 MHz Finer GPS accuracy Secure telemetry radio Source: 3DRobotics, PX4 autopilot ETH Zurich 20

Helping Drive Richer Audio Experiences 2x More performance delivering advanced sound processing Cortex-M7 160 MHz 7.1 Multi-channel audio support More speaker EQ processing Cortex-M4 130 MHz Capacity for decoders More connectivity options 21

Cortex-M7 in Automotive Trends and challenges: Safety certification mandated in more regions Convergence of functionality into fewer MCUs/ASSPs Increasing user requirements and expectations Typical Applications Dashboard in medium-range cars Voice recognition (for Multimedia control functions) Character recognition (eg Kanji) Convenience features Chassis, electric power steering, steer-by-wire Automotive audio Cortex-M7 Advantages: High performance core with fast DSP Safety features built in and safety manual Determinism with high performance Full trace via ETM 22

Cortex-M7 in Industrial Control Trends and challenges High performance control functions Safety, reliability and conformance will become mandatory 80-90% of cost is software, Cortex-M offers scalability and protects software investment Typical applications: Factory Automation Inverters and servos Programmable Logic Controllers High-speed comms Intelligent motor control Cortex-M7 Advantages: Increased DSP performance for control functions Safety features built-in In-order pipeline gives performance with predictability TCMs and low interrupt latency: Interrupt response within 100ns required Scalability from Cortex-M3 through Cortex-M7 up to Cortex-A53 23

Cortex-M7: Harnessing the Cortex-M Ecosystem With support for the new Cortex-M7 processor, we are further strengthening our leading market position by delivering development tools for ARM with an outstanding benchmark score of 5.04 CoreMark/MHz - Stefan Skarin, IAR Systems Our robust embedded software components are designed to be used in high performance applications targeted by Cortex-M7, including industrial control, safety and IoT - Jean Labrosse, Micrium ARM Cortex-M7 will bring substantially more computing power to embedded applications, and SEGGER will continue to innovate new products and features for each new generation of ARM processors - Rolf Segger, SEGGER 24

Cortex-M7 Lead Partners The Cortex-M7 is well positioned between Atmel s Cortex-M based MCUs and Cortex-A based MPUs enabling Atmel to offer an even greater range of processing solutions. Customers using the Cortex-M based MCU will be able to scale up performance and system functionality, while keeping the Cortex-M class ease-of-use and maximizing software reuse. We see the ARM Cortex-M7 addressing high-growth markets like IoT and wearables, as well as automotive and industrial applications that can leverage its performance and power efficiency Reza Kazerounian, Atmel Freescale Cortex-M7-based solutions dramatically extend MCU performance, opening new opportunities for our business. Our solutions will enable significant innovation and system-level efficiency in areas such as motor control, industrial automation and power conversion. These are rapidly growing markets where the high performance of the Cortex-M7 core eliminates the need for additional DSPs and microcontrollers - Geoff Lees, Freescale Offering customers more intelligence and processing power on our STM32 microcontrollers is a major objective for ST, and the Cortex-M7 delivers that impressively. The Cortex-M7 core supports upwardly-scalable compatibility with our existing wide range of 500 Cortex-M STM32 microcontrollers, associated tools and software ecosystem, allowing developers to rapidly adopt our next-generation STM32 Cortex-M7-based MCUs - Daniel Colonna, STMicroelectronics 25

Supercharge Cortex-M based solutions Develop versatile, scalable solutions Address safety critical applications Harness the broadest ecosystem 26

Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 27