ARM Processors for Embedded Applications

Similar documents
Contents of this presentation: Some words about the ARM company

ARM ARCHITECTURE. Contents at a glance:

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

CISC RISC. Compiler. Compiler. Processor. Processor

Fatima Michael College of Engineering & Technology

ARM Processor. Dr. P. T. Karule. Professor. Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems

The ARM10 Family of Advanced Microprocessor Cores

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Camellia Getting Started with ARM922T

Modular ARM System Design

18-349: Embedded Real-Time Systems Lecture 2: ARM Architecture

ECE 471 Embedded Systems Lecture 2

Samsung S3C4510B. Hsung-Pin Chang Department of Computer Science National Chung Hsing University

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

Chapter 15 ARM Architecture, Programming and Development Tools

Hercules ARM Cortex -R4 System Architecture. Processor Overview

ARM Architecture. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen, Ville Pietikainen

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

Universität Dortmund. ARM Architecture

ARM System Design. Aim: to introduce. ARM-based embedded system design the ARM and Thumb instruction sets. the ARM software development toolkit

RM3 - Cortex-M4 / Cortex-M4F implementation

ELCT 912: Advanced Embedded Systems

Agenda. ARM Core Data Flow Model Registers Program Status Register Pipeline Exceptions Core Extensions ARM Architecture Revision

Copyright 2016 Xilinx

The Nios II Family of Configurable Soft-core Processors

ECE 471 Embedded Systems Lecture 2

Chapter 5. Introduction ARM Cortex series

The ARM Cortex-M0 Processor Architecture Part-1

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

MICROPROCESSORS AND MICROCONTROLLERS 15CS44 MODULE 4 ARM EMBEDDED SYSTEMS & ARM PROCESSOR FUNDAMENTALS ARM EMBEDDED SYSTEMS

The ARM Architecture. Outline. History. Introduction. Seng Lin Shee 20 th May 2004

ARM Processor Fundamentals

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

Introduction CHAPTER IN THIS CHAPTER

The Next Steps in the Evolution of Embedded Processors

Embedded Systems: Architecture


Systemy RT i embedded Wykład 5 Mikrokontrolery 32-bitowe AVR32, ARM. Wrocław 2013

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Spring 2012 Prof. Hyesoon Kim

ARM processor organization

The Next Steps in the Evolution of ARM Cortex-M

ARMv8-A Software Development

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

4. Hardware Platform: Real-Time Requirements

ECE 471 Embedded Systems Lecture 3

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Cortex-R5 Software Development

RISC-V Core IP Products

ARM968E-S. Technical Reference Manual. Revision: r0p1. Copyright 2004, 2006 ARM Limited. All rights reserved. ARM DDI 0311D

Computer and Digital System Architecture

systems such as Linux (real time application interface Linux included). The unified 32-

Designing Embedded Processors in FPGAs

PRODUCT BACKGROUNDER

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

Cortex-A9 MPCore Software Development

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

ARM920T. Technical Reference Manual. (Rev 1) Copyright 2000, 2001 ARM Limited. All rights reserved. ARM DDI 0151C

VLSI Design of Multichannel AMBA AHB

Interrupt/Timer/DMA 1

Lecture 10 Exceptions and Interrupts. How are exceptions generated?

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

In this tutorial, we will discuss the architecture, pin diagram and other key concepts of microprocessors.

Rapidly Developing Embedded Systems Using Configurable Processors

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

RA3 - Cortex-A15 implementation

ARM Cortex core microcontrollers

Module Introduction! PURPOSE: The intent of this module, 68K to ColdFire Transition, is to explain the changes to the programming model and architectu

picojava I Java Processor Core DATA SHEET DESCRIPTION

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

ARM-Based Embedded Processor Device Overview

AMBA AHB Bus Protocol Checker

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

ELC4438: Embedded System Design ARM Embedded Processor

Growth outside Cell Phone Applications

The ARM Cortex-A9 Processors

AMULET3i an Asynchronous System-on-Chip

Jazelle ARM. By: Adrian Cretzu & Sabine Loebner

ECE 471 Embedded Systems Lecture 2

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

RM4 - Cortex-M7 implementation

LEON4: Fourth Generation of the LEON Processor

LX4180. LMI: Local Memory Interface CI: Coprocessor Interface CEI: Custom Engine Interface LBC: Lexra Bus Controller

Introducing the Superscalar Version 5 ColdFire Core

Excalibur ARM-Based. Embedded Processors PLDs. Hardware Reference Manual January 2001 Version 1.0

Cortex-M3/M4 Software Development

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Running ARM7TDMI Processor Software on the Cortex -M3 Processor

DQ8051. Revolutionary Quad-Pipelined Ultra High performance 8051 Microcontroller Core

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part1

Interconnects, Memory, GPIO

Course Introduction. Purpose: Objectives: Content: Learning Time:

Excalibur Device Overview

Transcription:

ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or MMU Real-time debug (RTD) and real-time trace (RTT) ARM9/ARM9E-S: (E: DSP-Enhanced; S: Synthesizable) Hard cores and Soft cores Cache with MPU or MMU DSP RTD and RTT ARM10E: Vector floating point (VFP) coprocessor RTD and RTT All cores incorporate I- and D-cache StrongARM Cache with MMU XScale 2

Evolution of the ARM Architecture 3 Development Benefit for ARM Core designed for embedded applications: High-performance, low power consumption Small die size, tight code density Software debug capabilities Compatible instruction set architecture Standard bus architecture (AMBA) for design reuse ARM and third-party tool chains and ASIC tools Multiple RTOS solutions Multiple silicon sources 4

Example for using ARM Processor 5 ARM Product Roadmap 6 families: 100 MIPS to 1000 MIPS 6

ARM Product Roadmap (Cont.) Roadmap for Open Platform Applications 7 ARM Product Roadmap (Cont.) Roadmap for Embedded Applications 8

ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 9 ARM Architecture Basics ARM is a 32-bit load/store and RISC architecture 37 total registers for 7 processor modes: 20 visible 32-bit registers in privileged modes (17 in user mode) r0-r13 = general purpose registers (r13 = stack pointer (SP)) r14 = link register (LR) r15 = program counter (PC) SPSR = saved program status register (only accessible in privileged modes) CPSR = current program status register The first four bits of the Current Program Status Register (CPSR) is set automatically during every arithmetic, logic, or shifting operation 10

Register Organization 11 Register Organization (Cont.) User mode: the only non-privileged mode It has the least number of total registers visible It has no SPSR and limited access to the CPSR FIQ and IRQ modes: Two interrupt modes of the CPU FIQ mode has an additional five banked registers to provide more flexibility and higher performance when handling critical interrupts Supervisor mode: The default mode of the processor on start up or reset Undefined mode: Traps unknown or illegal instructions when they are passed though the pipeline Abort mode: Traps illegal memory accesses as a result of fetching instructions or accessing data System mode: Uses the user mode bank of registers Provide an additional privileged mode when dealing with nested interrupts Thumb state: Register banks are split into low and high register domains The majority of instructions in Thumb state have a 3-bit register specifier These instructions can only access the low registers in Thumb (R0 through R7) The high registers (R8 through R15) have more restricted use 12

TDMI Thumb: 16-bit subset of the 32-bit ARM instruction set Debug: Additional core signals for debug use On-chip debugging facilities Multiplier: Supports for an enhanced multiplier for 64-bit results EmbeddedICE Logic: Logic/registers to control debug facilities On-chip break point and watchpoint supports 13 Thumb 16-bit Instructions Reasons of using Thumb instructions: Reduce memory cost leads to smaller code size and the use of narrower memory performance loss due to using a narrow memory path E.g. a 16-bit memory path with a 32-bit processor: the processor must take two memory access cycles to fetch an instruction or read and write data Processor need to provide a better code density for embedded processor Compressed subset of the 32-bit ARM instruction set: Suitable for lower bus bandwidth from narrow external memory Improves code density A Thumb enabled ARM: Executes both 32-bit ARM and 16-bit Thumb instructions Allows runtime inter-working between ARM and Thumb code State change performed via branch with exchange (BX) instruction Application must boot up with ARM and switch to Thumb if required Only the instructions are 16-bit 14

Thumb Benefit: An Example BGE: Branch if Greater or Equal to RSBLT: Reverse Subtract if Less Than tcc: thumb C compiler (for two-byte instructions) armc: ARM C compiler (for four-byte instructions) 15 Thumb Features The Thumb instruction set was created based upon three different criteria: Instructions that only require 16-bit space to be defined Instructions that were needed by the compiler Instructions that were most commonly used in real world applications Thumb programs typically are: ~30% smaller than ARM programs ~30% faster when accessing 16-bit memory Thumb reduces 32-bit system to 16-bit cost: Consumes less power Requires less external memory 16

Debug Extensions Goal: To provide a means to debug a final product without adding extra dependent pins to the design The Debug extension adds: Scan chains around the core to monitor what is occurring on the data path of the CPU Signals were added to the core so that processor control can be handed to the debugger when a breakpoint or watchpoint has been reached Enabling user to view characteristics such as register contents, memory regions, and processor status A Test Access Port (TAP) controller to allow access to the scan chains The EmbeddedICE Logic adds: A set of registers providing the ability to set hardware breakpoints or watchpoints on code or data Monitors the ARM core signals every cycle to check if a breakpoint or watchpoint has been hit An additional scan chain is used to establish contact between the user and the EmbeddedICE logic 17 EmbeddedICE Logic The advantage of on-chip debug can rapidly debug software (especially in ROM) Parallel Port Yellow: Debug Extension Pink: EmbeddedICE Logic Multi-ICE 18

ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 19 The ARM7TDMI Core Architecture version 4T: 3-stage pipeline Unified cache 32-bit ARM plus 16-bit Thumb extension Forward compatible code (compatible with ARM9 and ARM10) EmbeddedICE on-chip debug Hard Macrocell IP Smallest Die Size: 0.53 mm 2 on 0.18 µm process (around 70,000 transistors) Up to 110 MHz on TSMC standard 0.18 µm Low power consumption: 0.25 mw/mhz 20

ARM720T Cached Macrocell for Platform OS Applications ARM7TDMI core: ARM v4t Thumb 16-bit instruction set 8 KB cache (data and instruction caches) AMBA Advanced System Bus (ASB) interface Can be converted directly from an ASB to the higher performance AHB interface Memory Management Unit: Full support for WindowsCE and Symbian OS, Linux, or PalmOS Offers virtual-to-physical address translation 64-entry Translation Lookaside Buffer (TLB) Two-level page tables stored in memory Hard Macrocell IP 2.9 mm 2 on 0.18 mm process 75 MHz on TSMC standard 0.18 Power: 0.55 mw/mhz 21 Strength of ARM7 vs ARM9 22

Pipeline Comparison 23 Pipeline Comparison (Cont.) ARM9 TDMI: Decode stage: Allows direct decoding of Thumb instructions to remove the need to translate and decode as with the ARM7 Operations previously performed in the execute stage of ARM7 are spread across four stages in the ARM9 pipeline: decode, execute, memory, and write The reorganization and removal of these critical paths resulted in a much higher clock frequency Memory: an additional cycle for data memory access An interruption of the pipeline is required for ARM7 since it only has one single memory port (for both data and instruction) ARM10 TDMI (6 Stages pipeline): Issue: between the fetch and decode stages to allow more time for instruction decode Helpful when an unpredicted branch is executed 24

ARM9TDMI Processor Core ARM 32-bit and Thumb 16-bit instructions Harvard 5-stage pipeline implementation: Enable simultaneous access to instructions and data memories Higher performance from reduced cycle per instruction Coprocessor interface for on-chip coprocessors: Allows floating point, DSP, graphics accelerators EmbeddedICE debug capability with extensions: Hardware single step Breakpoint on exception High code compatibility with ARM7TDMI Portable to 0.25, 0.18 µm CMOS and below 25 ARM940T Macrocell Processor for real-time embedded applications: ARM9TDMI Core 4 KB instruction and 4 KB data caches Memory protection unit (MPU) for RTOS Allows eight separate regions of memory Each with cache, write buffer enable, and access permission Configured using on-chip registers Eliminates the need for page-mapping tables stored in memory Code compatible from ARM7TDMI Hard Macro IP: 4.2mm 2 on 0.18 Up to 200 MHz on TSMC standard 0.18 µm Power: 0.75 mw/mhz Applications: wireless networking devices, printers, or automotive control devices 26

ARM9E Architecture Hard macrocells always have been the ultimate answer for optimized performance and die size But newer synthesized design flows are pushing the envelope for SoC applications ARM released a whole family with high performance configurable CPUs The ARM9E family was built upon the standard set by the ARM9TDMI family But it also provides freedom for defining the cache and SRAM configurations used by the core Fully code compatible with ARMv4T architecture cores 32-bit load/store RISC architecture Efficient 5-stage pipeline ARM and Thumb instruction sets 27 ARM9E Architecture (Cont.) ETM9 (Embedded Trace Macrocell) interface Enhance the debug capabilities The first family of CPUs designed to use the AHB bus of the AMBA 2.0 specification Coprocessor interface Synthesizable or soft IP Includes DSP extensions for true real-time systems Improvement to introduce additional multiply and saturated math instructions for use by complex DSP algorithms 28

ARM9E Family 29 ARM10E Architecture Enhancements ARM10E implements: Harvard 6-stage pipeline Supports v5te instruction set Fully compatible with v4t architecture Enhanced EmbeddedICE debug logic Advanced microprocessor cores with 390-700 MIPS integer performance Performance enhancements include: Branch prediction: Eliminates 70% of branches on typical code sequences VFP10: high performance vector floating-point support Single and double precision IEEE 754 floating-point arithmetic Vector load/stores can run concurrently New energy saving power down modes To achieve low-power operation on high performance processes These features improve code performance by lowering the average number of cycles per instruction of the processor 30

Family Summary 31 ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 32

AMBA: An Open Bus Standard Easy interconnection of macrocells Optimizes system power Simplifies reuse Eases testing Proven and open standard Reduces time to market 33 Advanced High Performance Bus (AHB) The AHB bus is designed with all timing placed on the rising clock edge This timing maximizes the efficiency of the system and aligns with modern synthesis design flows Multiple bus masters Shares resources between different bus masters (CPU, DMA Controller, etc) Optimizes system performance Pipelined and burst transfers Allows high speed memory and peripheral access Split transactions supported Enables high latency slaves to release the system bus during long transactions Wide data bus configuration 32/64/128 up to 1024-bits wide 34

Advanced Peripheral Bus (APB) Simplicity and power conservation are the driving features behind the design of the APB side of the bridge Simple bus Non-pipelined architecture All peripherals act as slaves on the APB, easing the integration of this bus The bridge is the only APB bus master Simpler interface means low gate count Low power Isolated peripherals behind the bridge reduces load on the main system bus Peripheral bus signals active only during low bandwidth peripheral transfers Ideal for ancillary or general-purpose peripherals E.g. Timers, interrupt controllers, and I/O ports 35 AMBA Design Kit Components for EASY (Example AMBA Systems) development: The example bus master provides a starting point for bus master development A file reader bus master generates AMBA transactions to exercise the bus and peripherals An arbiter provides a simple priority algorithm to control master access to the AHB bus 36

ARM7TDMI and 922T EASYs 37 AMBA Design Kit Components AHB File Reader Bus Master Example Bus Master Multi-Layer Bus Switch AHB TIC (Test Interface Controller) AHB to APB and AHB to AHB Bridges AHB Arbiter Internal Memory Example AHB Slave with Retry Simple External Memory Interface APB Timers Interrupt Controller Remap/Pause Controller Example AMBA Bus Systems Macrocell Wrappers ARM720T ARM740T ARM940T ARM920T ARM922T ARM7TDMI ARM7TDMI-S Example Software C and ARM Assembler Documentation AMBA Specification User Guide and Release Notes ADK Technical Reference Manual 38