NXP Unveils Its First ARM Cortex -M4 Based Controller Family

Similar documents
ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Cortex M4-based LPC4300 The first asymmetric multi-core MCU for the industry

Chapter 15 ARM Architecture, Programming and Development Tools

LPC4000 Family. October/November 2010 Presenter s Name

ARM Processors for Embedded Applications

LPC4370FET256. Features and benefits

STM32 F0 Value Line. Entry-level MCUs

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

2-bit ARM Cortex TM -M3 based Microcontroller FM3 Family MB9A130 Series

Copyright 2016 Xilinx

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller.

RM3 - Cortex-M4 / Cortex-M4F implementation

ELC4438: Embedded System Design ARM Embedded Processor

ECE 471 Embedded Systems Lecture 2

AN Migrating to the LPC1700 series

Universität Dortmund. ARM Architecture

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

MICROPROCESSOR BASED SYSTEM DESIGN

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

Chapter 5. Introduction ARM Cortex series

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

STM32F429 Overview. Steve Miller STMicroelectronics, MMS Applications Team October 26 th 2015

STM32 Journal. In this Issue:

Contents of this presentation: Some words about the ARM company

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

3 2-bit ARM Cortex TM -M3 based

Interconnects, Memory, GPIO

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

The ARM10 Family of Advanced Microprocessor Cores

STM bit ARM Cortex MCUs STM32F030 Series

TEVATRON TECHNOLOGIES PVT. LTD Embedded! Robotics! IoT! VLSI Design! Projects! Technical Consultancy! Education! STEM! Software!

ATmega128. Introduction

STM8L and STM32 L1 series. Ultra-low-power platform

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

STM32 MICROCONTROLLER

Product Technical Brief S3C2416 May 2008

Hello, and welcome to this presentation of the STM32 Flash memory interface. It covers all the new features of the STM32F7 Flash memory.

ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview

The ARM Cortex-M0 Processor Architecture Part-1

Military Grade SmartFusion Customizable System-on-Chip (csoc)

The S6000 Family of Processors

Implementation of DSP Algorithms

INTRODUCTION TO FLEXIO

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

REAL TIME DIGITAL SIGNAL PROCESSING

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Designing with STM32F2x & STM32F4

Remote Keyless Entry In a Body Controller Unit Application

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

ECE 471 Embedded Systems Lecture 2

AVR Microcontrollers Architecture

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

LEON4: Fourth Generation of the LEON Processor

Diploma in Embedded Systems

acret Ameya Centre for Robotics & Embedded Technology Syllabus for Diploma in Embedded Systems (Total Eight Modules-4 Months -320 Hrs.

Getting Started With the Stellaris EK-LM4F120XL LaunchPad Workshop. Version 1.05

systems such as Linux (real time application interface Linux included). The unified 32-

AN4749 Application note

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

2-Oct-13. the world s most energy friendly microcontrollers and radios

ARM Cortex core microcontrollers

Course Introduction. Purpose: Objectives: Content: Learning Time:

University Program Advance Material

Product Series SoC Solutions Product Series 2016

ARDUINO MEGA INTRODUCTION

Hello, and welcome to this presentation of the STM32F7 System Configuration Controller.

Introduction to ARM LPC2148 Microcontroller

NXP Cortex-M0 LPC1100L Design with a Cortex-M0 in a DIP package ASEE Tech Session. Sergio Scaglia (NXP Semiconductors) August 2012

STM32 Cortex-M3 STM32F STM32L STM32W

Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development

ECE 471 Embedded Systems Lecture 3

Next Generation Multi-Purpose Microprocessor

Choosing a Micro for an Embedded System Application

General Purpose Signal Processors

The course provides all necessary theoretical and practical know-how for start developing platforms based on STM32L4 family.

Cortex-M Processors and the Internet of Things (IoT)

NXP Microcontrollers Selection Guide

STM32F4 Labs. T.O.M.A.S Technically Oriented Microcontroller Application Services V1.07

ELC4438: Embedded System Design Embedded Processor

Hercules ARM Cortex -R4 System Architecture. Processor Overview

突破 8-/16-/32- 位和 DSP 界限的 ARM MCU 解决方案

ECE 471 Embedded Systems Lecture 2

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

Optimizing RX Performance

STM32 F-2 series High-performance Cortex-M3 MCUs

AVR XMEGA TM. A New Reference for 8/16-bit Microcontrollers. Ingar Fredriksen AVR Product Marketing Director

Introduction to Microcontroller Apps for Amateur Radio Projects Using the HamStack Platform.

Introduction to L.A.P. 1

Topic 3. ARM Cortex M3(i) Memory Management and Access. Department of Electronics Academic Year 14/15. (ver )

Rapidly Developing Embedded Systems Using Configurable Processors

Supercharging the Embedded Device: ARM Cortex -M7. Ian Johnson Senior Product Manager, ARM

STM32F7 series ARM Cortex -M7 powered Releasing your creativity

ARM ARCHITECTURE. Contents at a glance:

LPC4357-EVB User Manual

Incorporating a Capacitive Touch Interface into Your Design

Transcription:

NXP s LPC4300 MCU with Coprocessor: NXP Unveils Its First ARM Cortex -M4 Based Controller Family By Frank Riemenschneider, Editor, Electronik Magazine At the Electronica trade show last fall in Munich, Dutch chip maker NXP introduced its first Cortex -M4 based microcontrollers. The LPC4300 offers a big surprise: it contains a Cortex -M0 coprocessor, allowing the Cortex-M4 to concentrate on what it does best: crunching numbers for digital signal control applications. 4

5

As expected, the LPC4300 is an advanced version of the LPC800. The two families are pin-compatible and also share exactly the same peripherals. Like the LPC800, the LPC4300 (Figure ) is implemented using 90 nm process technology, allowing for a maximum clock speed of 50 MHz. Figure : Block diagram of the LPC4300 MCU family. These devices are implemented using 90 nm process technology. Especially interesting is the fact that the LPC4300 features a dual-core architecture. The Cortex-M4 processor (see The Box) is supplemented by a Cortex-M0 coprocessor, which is ARM s smallest core, requiring a mere 30 µa/mhz. The Cortex-M0 s task is to offload many of the data transfer and I/O handling duties that can drain the bandwidth of the Cortex-M4 processor, especially when running math-intensive DSP algorithms. Thanks to multiple clock sources, the on-chip peripherals can be operated at individual speeds and may even be turned off completely if not in use. In addition, there are four power modes: In Sleep mode, the clock to the core is stopped while peripheral functions continue to operate. In Deep Sleep mode, the main oscillator is powered down and nearly all clocks are stopped only the interrupt controller remains running. The flash memory is put in standby mode, allowing for a quick wake-up. In Power-down mode, the interrupt controller and the flash memory are also turned off. The system status is preserved in special registers. In Deep Power-down mode, power is turned off to the entire chip. The system status is lost except for the contents of the real-time clock backup registers, making it possible to wake up from Deep Power-down mode via a real-time clock interrupt (or via the reset pin). THE BOX: Cortex-M4: An Extension of the Cortex -M3 The Cortex-M4 processor brings to the low-cost microcontroller segment what has long been well known from the application processors: dedicated hardware for digital signal processing, supplemented by an optional floating point unit (FPU) adding 5 instructions for single precision math operations. The architecture is based on the well-known ARMv7 architecture and is now called ARMv7E-M, indicating the DSP extensions. Instead of a simple 3x3-bit multiplier, the Cortex- M4 processor is able to perform MAC operations up to 64x64 + 64 bit in a single clock cycle. Just like in the Cortex-M3 processor, a 3-bit hardware divider is integrated as well. Compared to the Cortex-M3 processor, the Cortex-M4 processor has more watch points (6 vs. ) and more break points (4 vs. ). The interrupt structure as well as the internal microarchitecture (3-stage pipeline, Harvard architecture) is identical. In order to achieve higher code density, the Cortex-M4 processor also implements the 6-bit Thumb- instruction set. The image shows the Cortex-M4 processor block diagram. As with the FPU, the implementation of the WIC (wake-up interrupt controller) and the debug and trace units is optional. Not implementing these units reduces the amount of gates and therefore the power con- WIC DAP Data Watchpoints Codeinterface Cortex-M4 Core DSP sumption. Despite the new DSP hardware, the rise in power consumption compared to the Cortex-M3 processor is rather insignificant. When implemented in a TSMC 65 nm low-power process, the Cortex -M3 processor draws 0.05 mw/mhz compared to 0.06 mw/mhz for the Cortex-M4 processor. At a clock rate of 50 MHz, this only results in an increase from 7 to 9 mw. The FPU has its own pipeline (separate from the integer pipeline) and, aside from six math operations, also includes a fused MAC operation which delivers higher precision results. The maximum clock frequency could potentially be beyond 00 MHz, however so far there are no such implementations. Everyone agrees that a standard MCU is only of limited use for digital signal processing. The relatively simple task of decoding an MP3 file already requires a clock frequency of 0 to 5 MHz. Even a dedicated DSP needs about 0 to 5 MHz while the Cortex-M4 processor can do this at around 9 MHz. The Cortex-M4 processor is only surpassed by DSPs specifically designed for audio processing which can manage this task at 6 to 7 MHz. At 9 MHz, the Cortex-M4 processor only draws around 0.5 mw. Obviously the hardware extensions are only one aspect. The deciding factor is how well the software utilizes them. ARM extended the compiler in the Keil tools in such a way that standard C code already makes use of most DSP instructions. However, in order to take full advantage of the hardware, certain instructions have to be explicitly integrated into the C code. The µvision debugger (including the instruction set simulator) was modified to accommodate the new features of the Cortex-M4 processor. Further software support is available in the form of a CMSIS (ARM Cortex Microcontroller Software Interface Standard) extension (which allows every CMSIS compatible compiler to utilize the DSP extensions), as well as library extensions with regards to math functions (especially trigonometry functions), control algorithms, and algorithms for digital filters with support from MathLab and LabVIEW. NVIC MPU Bus Matrix Flash Patch FPU ETM Serial Wire Viewer SRAM & peripheral I/F The Cortex-M4 block diagram would be identical to the one of the Cortex-M3 if not for the additional blocks for the FPU and DSP functionality (within the CPU itself). The blocks with the dotted lines are optional. 6

The Cortex-M4 and the Cortex-M0 processors communicate via shared memory. The Cortex-M0 processor has a separate clock as well as a power management unit. In the example of a motor control application, the Cortex-M4 processor would calculate the fieldoriented control parameters using its specialized DSP instructions and capabilities, while the Cortex-M0 processes control commands received via the CAN bus. When decoding an audio file, which may be received via USB and output via an audio codec connected to the I S port, the Cortex-M4 processor would take care of the audio processing while the Cortex-M0 handles the control of the USB and I S peripherals. An important fact for software developers is that the Cortex-M4 processor and the Cortex-M0 processor share a common debug interface. This allows for the debugging of both cores using a single JTAG debugger. In addition, the Cortex-M4 processor features a serial wire debug (SWD) unit. Like the LPC800, the LPC4300 can read data from the internal flash memory at maximum clock speed. In order to make this possible, and to enable the bus masters to access the flash memory and the peripherals simultaneously, NXP implemented 56-bit wide internal busses. An algorithm built into the intelligent pre-fetch unit decides to pre-fetch instructions if the core is running fast, or to wait until the flash buffer is nearly empty if the core is running slower. At maximum clock frequency the algorithm can pre-fetch more than eight words. For high reliability applications the on-chip flash memory can be operated in a front/back mode. In this mode the flash is split in two banks, with all code copied identically onto each half. This allows for the so-called 'golden copy' method of in-system firmware updates where the controller executes entirely from one half of memory while the other half is being re-written a feature insisted upon by automotive users. A big advantage of the LPC4300 is the multi-layer AMBA AHB bus, which is implemented as a bus matrix and allows RAM to be treated as multiple blocks for simultaneous access by the core and any peripheral that can also act as bus master eight potential masters in all. For DSP applications, two SRAM blocks can be defined: one 8 Kbyte block for code and one 7 Kbyte block for data. If all data is kept in one block and all instructions in the other, a 0% performance improvement can be achieved. The performance increase is even higher if the DSP capabilities are used to process things like waveforms stored in memory. In fact, one additional SRAM block of 3 Kbyte and two more SRAM blocks of 6 Kbyte can be defined. All this connectivity means a lot of metal tracks across the die, which is made possible by having seven metallization layers on the LPC4300, including two over memory areas. Three remarkable peripherals The LPC800 introduced three new peripheral blocks which are also present on the LPC4300. The first one is an SPI interface that can handle quad-mode operation for off-chip 8-pin flash ICs (Figure ). Until now, this interface was used primarily on PCs for loading the BIOS. After the initial handshake, the pins of the quad-spi interface are reconfigured to four data lines, each of which allows data AHB-Bus Interface Module FIFO Control Fetch Control SPIFI Register DMA Interface DMA Signals transfers of up to 80 Mbit/s a combined data rate of up to 40 Mbyte/s. Consequently, after a power-on reset, the contents of the entire flash memory can be transferred into on-chip SRAM or external RAM in about /00 to /3 of a second. Obviously these values can only be achieved if 56 bytes are read per fetch and drop if fewer bytes are read in parallel. If only byte is read per fetch, the data rate declines to 5 Mbyte/s. NXP has tested the quad-mode operation with 80 different flash chips between 5 Kbyte and 6 Mbyte from suppliers Atmel, Gigadevice, Macronix, Micron, Microchip, and Winbond. A typical problem in MCU applications with LCD is the storage of the images to be displayed on the screen. They are stored in flash memory and usually have to be moved to SRAM before they can be transferred to the LCD controller. Consequently, a relatively large SRAM buffer is required for this task. However, the quad-spi interface can transfer the images via DMA directly to the LCD controller, eliminating the need for the SRAM buffer. The quad-spi interface also comes in handy when running DSP algorithms which have to be moved from flash memory to SRAM in order to be executed. The LPC4300 can use extremely cheap external SPI flash memory to store the code. Another significant peripheral is the so called State Configurable Timer (SCT) sub-system (see Figure 3, next page). This is essentially a programmable general purpose state-machine combined with a timer. Seven of the state-machine s eight inputs as well as 6 timer outputs are brought out to external pins, allowing it to interact with external hardware. Once the state-machine has been initialized by the CPU, and assuming the required number of states fits into the 3 provided, the SCT becomes autonomous and requires no further CPU intervention. From the outputs, inputs, and states, another block decodes up to 6 events to drive the state-machine from state to state. FIFO Fetch Unit SPI Flash I/Os Figure : The SPI interface allows quad-mode operation for off-chip 8-pin flash ICs, which allows for data transfers of up to 40 Mbyte/s. A traffic light controller is a simple example for an application that can be completely implemented in hardware using the SCT. There 7

HCLK inputs control logic clock processing prescalar(s) counter(s) UTclock match/ capture regs synced inputs match logic are five outputs (three lights for the cars and two for the pedestrians), one input (the button the pedestrians press to request a green light), and four states (cars = green and pedestrians = red, cars = yellow and pedestrians = red, cars = red and pedestrians = red, as well as cars = red and pedestrians = green). According to NXP, the SCT offers enough resources to build quite sophisticated functions including a brushless DC, stepper, or AC induction motor controller. Serial interfaces, e.g. for DALI or LIN, are also an option. The LPC4300 currently does not include these types of serial interfaces as dedicated on-chip peripherals. The third new peripheral block is called SGPIO (Serial General Purpose I/O). Its task is to offload the CPU while performing serial data transfers. Embedded developers often face the problem of dealing with peripherals that use non-standard serial interfaces (e.g. LCD drivers, audio codecs, etc). In order to create or capture the necessary real-time serial data streams, firmware engineers need to implement code loops that manipulate GPIO in real-time. This method, known as bit-banging, is not only CPU-intensive, but also prevents the CPU from entering any of the low-power modes. To solve this problem, the LPC4300 provides 6 generalpurpose I/Os which each have their own timer register and two 3-bit shift registers. In addition, there are two counters: one for controlling the rate at which data is clocked in or out, and one for controlling the number of bits being clocked in or out. Configuring the SGPIO registers generates the desired waveform(s) and triggers an interrupt when the data has been clocked out. NXP will provide a library of configuration examples for various interfaces (e.g. Quad- SPI, PCM, I S, I C, or SPI). How can the LPC4300 with its Cortex-M4 processor show off its strengths? As an example, consider a nd order IIR filter according to the following formula: y[n] = b 0 x[n] + b x[n ] + b x[n ] a y[n ] a y[n ] event generation outputs state logic interrupts Figure 3: The State Configurable Timer (SCT), a programmable state machine in combination with a timer, allows for the implementation of sophisticated functions like motor control. Table shows the corresponding code including the amount of clock cycles for the Cortex- M3 processor (LPC800) as well as the Cortex-M4 processor (LPC4300), looking only at the inner loop. This assumes that the coefficients b 0, b, b, a, and a, as well as the previous states x[n ], x[n ], y[n ], and y[n ], are all located in registers. Thanks to the Cortex-M4 processor s DSP-instructions, the amount of required clock cycles can be reduced from between 7 and 47 (depending on the data) to 6. Further optimization even allows a reduction down to ten clock cycles. NXP will provide a free C library of optimized DSP algorithms, containing FFT (supporting both 6- and 3-bit data lengths and block sizes of 64-, 56- and 04-bit FIR and IIR filters (supporting both 6- and 3-bit single stage biquads), PID controller, random number generator, cross product of vectors, and more. Instruction Cortex-M3 clock cycles Cortex-M4 clock cycles xn = *x++; yn = xn * b0; yn += xnm * b; yn += xnm * b; yn -= ynm * a; yn -= ynm * a; *y++ = yn; xnm = xnm; xnm = xn; ynm = ynm; ynm = yn; Decrement loop counter Branch Total clock cycles 7 to 47 6 Table : Thanks to the additional DSP instructions, an IIR filter can be implemented on the Cortex-M4 using less clock cycles than on the Cortex-M3. The first available members of the LPC4300 family (430, 430, 4330, and 4350) will be flash-less devices and will therefore require external Flash memory. Samples will be available in Q/Q3 0. The devices with on-chip flash memory will follow in Q4/0. These MCUs will contain 5 Kbyte, 768 Kbyte, or Mbyte of on-chip flash memory. END 8