EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

Similar documents
Amber Baruffa Vincent Varouh

Introduction CHAPTER IN THIS CHAPTER

Contents of this presentation: Some words about the ARM company

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

ARM Processors for Embedded Applications

Chapter 5. Introduction ARM Cortex series

18-349: Embedded Real-Time Systems Lecture 2: ARM Architecture

Chapter 15 ARM Architecture, Programming and Development Tools

ECE 471 Embedded Systems Lecture 2

Universität Dortmund. ARM Architecture

ECE 471 Embedded Systems Lecture 2

About EmbeddedCraft. Embedded System Information Portal, regularly publishes. Follow us on

Arm Architecture. Enrique Secanechia Santos, Kevin Mesolella

The ARM Cortex-M0 Processor Architecture Part-1

ELC4438: Embedded System Design ARM Embedded Processor

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

ECE 471 Embedded Systems Lecture 3

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (

Each Milliwatt Matters

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

ELCT 912: Advanced Embedded Systems

Growth outside Cell Phone Applications

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.

EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design

ARM Cortex-A* Series Processors

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Cortex-R5 Software Development

The Nios II Family of Configurable Soft-core Processors

Agenda. ARM Core Data Flow Model Registers Program Status Register Pipeline Exceptions Core Extensions ARM Architecture Revision

When Girls Design CPUs!

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

COSC 122 Computer Fluency. Computer Organization. Dr. Ramon Lawrence University of British Columbia Okanagan

Spring 2012 Prof. Hyesoon Kim

The ARM10 Family of Advanced Microprocessor Cores

ARM Processor Fundamentals

New ARMv8-R technology for real-time control in safetyrelated

CSCI 402: Computer Architectures. Instructions: Language of the Computer (1) Fengguang Song Department of Computer & Information Science IUPUI

Embedded Operating Systems. Unit I and Unit II

ELC4438: Embedded System Design Embedded Processor

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Systemy RT i embedded Wykład 5 Mikrokontrolery 32-bitowe AVR32, ARM. Wrocław 2013

CPE300: Digital System Architecture and Design

Overview of Development Tools for the ARM Cortex -A8 Processor George Milne March 2006

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

Copyright 2016 Xilinx

Embedded Systems: Architecture

Cortex-A9 MPCore Software Development

Jazelle. The ARM Architecture. NeON. Thumb

PowerPC 740 and 750

CS 153 Design of Operating Systems Winter 2016

Modular ARM System Design

Lecture 4: RISC Computers

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

CISC / RISC. Complex / Reduced Instruction Set Computers

ARM Architecture. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen, Ville Pietikainen

Cortex A8 Processor. Richard Grisenthwaite ARM Ltd

ARM Cortex core microcontrollers

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems

Embedded Computing Platform. Architecture and Instruction Set

The ARM Cortex-A9 Processors

How to manage Cortex-M7 Cache Coherence on the Atmel SAM S70 / E70

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Lecture 4: RISC Computers

COMPUTER ORGANIZATION AND ARCHITECTURE

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

The ARM Architecture. Outline. History. Introduction. Seng Lin Shee 20 th May 2004

CprE 488 Embedded Systems Design. Lecture 3 Processors and Memory

CISC RISC. Compiler. Compiler. Processor. Processor

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ARMv8-A Software Development

ARM ARCHITECTURE. Contents at a glance:

ECE 486/586. Computer Architecture. Lecture # 7

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS

Virtual Memory: From Address Translation to Demand Paging

Chapter 4. Enhancing ARM7 architecture by embedding RTOS

KeyStone II. CorePac Overview

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

ARM Cortex -M for Beginners

Fundamentals of Computer Design

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Introduction to Embedded System Processor Architectures

ARC HS4x and HS4xD CPUs: New Dual-Issue Architecture Boosts Embedded Processor Performance

Instruction Set Principles and Examples. Appendix B

Kevin Meehan Stephen Moskal Computer Architecture Winter 2012 Dr. Shaaban

A 1-GHz Configurable Processor Core MeP-h1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

LX4180. LMI: Local Memory Interface CI: Coprocessor Interface CEI: Custom Engine Interface LBC: Lexra Bus Controller

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Hardware Level Organization

ARM Architecture and Assembly Programming Intro

Cortex-A15 MPCore Software Development

Computer Architecture. Fall Dongkun Shin, SKKU

ARM processors driving automotive innovation

ARM instruction sets and CPUs for wide-ranging applications

ARM Processor Architecture

Transcription:

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2014

Agenda Introduction ARM Processor Overview ARM Architecture Version ARM Processor Pipeline Design ARM7TDMI & ARM9TDMI ARM10 v.s. ARM11 Cortex-A8 ARM Programmer s Model ARM Instruction Set (To be Cont d) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 2

Introduction - ARM Advanced RISC Machines (ARM) the world's first commercial RISC processor developed by the Acorn Computer Group in 1985, spin out to form as a company. The ARM Instruction Set Used as the example in chapters 2 and 3 Most popular 32-bit instruction set in the world (www.arm.com) 4 Billion shipped in 2008 Large share of embedded core market Applications include mobile phones, consumer electronics, network/storage equipment, cameras, printers, Typical of many modern RISC ISAs See ARM Assembler instructions, their encoding and instruction cycle timings in appendixes B1,B2 and B3 (CD-ROM) 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 Other SPARC Hitachi SH PowerPC Motorola 68K MIPS IA-32 ARM 1998 1999 2000 2001 2002 Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 3

ARM Ltd Founded in November 1990 Spun out of Acorn Computers Initial funding from Apple, Acorn and VLSI Design the ARM range of RISC processor cores License ARM core designs to semiconductor partners who fabricate and sell to their customers ARM does not fabricate silicon itself Also develop technologies to assist with the design-in of the ARM architecture Software tools, boards, debug hardware Application softwares Bus architectures Peripherals, etc Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 4

ARM s Activities Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 5

Huge Range of Applications Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 6

Intellectual Property (IP) ARM provides hard and soft views to licencees RTL and synthesis flows <- soft view GDSII layout <- hard view Licencees have the right to use hard or soft views of the IP Soft views include gate level netlists Hard views are DSMs (distributed shared memory models) OEM must use hard views To protect ARM IP Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 7

ARM Core Family ARMv8 is a 64-bit architecture, but not yet has any commercial products. Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 8

ARM Architecture Versions Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 9

ARM Architecture Versions Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 10

ARM Architecture Version Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 11

ARM Architecture Version Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 12

Development of the ARM architecture Processor Architecture = Instruction Set + Programmer s model Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 13

ARM Architecture v7 Profiles Application profile (ARMv7-A) Memory management support (MMU) Highest performance at low power Influenced by multi-tasking OS system requirements TrustZone and Jazelle-RCT for a safe, extensible system e.g. Cortex-A5, Cortex-A9 Real-time profile (ARMv7-R) Protected memory (MPU) Low latency and predictability real-time needs Evolutionary path for traditional embedded business e.g. Cortex-R4 Microcontroller profile (ARMv7-M, ARMv7E-M, ARMv6-M) Lowest gate count entry point Deterministic and predictable behavior a key priority Deeply embedded use e.g. Cortex-M3 Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 14

ARM Processor Overview Apple A5 Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 16

Product Code Demystified Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 17

ARM Processor Cores Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 18

ARM Processor Cores Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 19

ARM Processor Cores Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 20

ARM Processor Cores Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 21

ARM Architecture Version Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 22

ARM Architecture Versions Information from WiKi: Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 23

Relative Performance Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 24

Application Processors Application Processors are defined by the processor s ability to execute complex operation systems, such as Linux, Android, Microsoft Windows (CE/Mobile), and Symbian Applications: Smartphones, Feature Phones, Netbooks, ereaders, Digital TV, Set-top Boxes, etc. Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 25

Embedded Processors Embedded Processors are primarily focused on delivering highly deterministic real-time behavior in a wide range of power sensitive applications, often execute a RTOS along with user applications. Applications: Merchant Microcontrollers, Automotive Control Systems, Moto Control Systems, Wireless and Wired Sensor Networks, Mass Storage Controllers, Printers, etc. Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 26

Real-time Processors ARM Cortex -R real-time processors offer high-performance computing solutions for deeply embedded systems with demanding real-time response constraints. Target applications are: Mobile handset processing in smart-phones and baseband modems Enterprise systems such as hard disk drives, networking and printing Home consumer electronics, set top boxes, digital TV, media players, cameras Embedded microcontrollers for dependable systems in medical, industrial and automotive Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 27

Cortex Family Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 28

Agenda Introduction ARM Processor Overview ARM Architecture Version ARM Processor Pipeline Design ARM7TDMI & ARM9TDMI ARM10 v.s. ARM11 Cortex-A8 ARM Programmer s Model ARM Instruction Set (To be Cont d) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 29

5-Stage Pipeline Organization Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 30

5-Stage Pipeline Organization Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 31

Pipeline Changes for ARM9TDMI ARM7TDMI Instruction Fetch Thumb AR M decompres s ARM decode Reg Select Reg Read Shift ALU Reg Write FETCH DECODE EXECUTE ARM9TDMI Instruction Fetch ARM or Thumb Inst Decode Reg Decode Reg Read Shift + ALU Memory Access Reg Write FETCH DECODE EXECUTE MEMORY WRITE Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 32

ARM10 vs. ARM11 Pipelines ARM10 Branch Prediction Instruction Fetch ARM11 ARM or Thumb Instruction Decode Reg Read Shift + ALU Multiply Shift ALU Saturate Memory Access Multiply Add Reg Write FETCH ISSUE DECODE EXECUTE MEMORY WRITE Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3 Write back Address Data Cache 1 Data Cache 2 Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 33

8-Stage Pipeline (v6 Architecture) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 34

Cortex-A8 Block Diagram Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 35

ARM Cortex-A Architecture Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 36

Full Cortex-A8 Pipeline Diagram Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 37

What is NEON? NEON is a wide SIMD data processing architecture Extension of the ARM instruction set (v7 -A) 32 x 64-bit wide registers (can also be used as 16 x 128-bit wide registers) NEON instructions perform Packed SIMD processing Registers are considered as vectors of elements of the same data type Data types available: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single prec. float Instructions usually perform the same operation in all lanes Elements Dn Dm Source Registers Operation Dd Destination Register Lane Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 38

Agenda Introduction ARM Processor Overview ARM Architecture Version ARM Processor Pipeline Design ARM7TDMI & ARM9TDMI ARM10 v.s. ARM11 Cortex-A8 ARM v7a Programmer s Model ARM Instruction Set (To be Cont d) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 39

Data Size and Instruction Sets The ARM is a 32-bit architecture When used in relation to the ARM: Byte means 8 bits Halfword means 16 bits (two bytes) Word means 32 bits (four bytes) Most ARM s implement two instruction sets 32-bit ARM Instruction Set 16-bit Thumb Instruction Set Jazelle cores can also execute Java bytecode Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 40

ARM and Thumb Performance Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 41

The Thumb-2 Instruction Set Variable-length instructions ARM instructions are a fixed length of 32 bits Thumb instructions are a fixed length of 16 bits Thumb-2 instruction can be either 16-bit or 32-bit Thumb-2 gives approximately 26% improvement in code density over ARM Thumb-2 gives approximately 25% improvement in performance over Thumb Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 42

Cortex-A8 Processor Modes Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 43

Cortex-A8 Register File Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 44

Cortex-A8 Exception Handling Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 45

Cortex-A8 Program Status Register Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 46

Conditional Execution and Flags Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 47

Memory Types Each defined memory region will specify a memory type The memory type controls the following: Memory access ordering rules Caching and buffering behaviour There are 3 mutually exclusive memory types: Normal Device Strongly Ordered Normal and Device memory allow additional attributes for specifying The cache policy Whether the region is Shared Normal memory allows you to separately configure Inner and Outer cache policies (discussed in the Caches and TCMs module) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 48

L1 and L2 Caches I-Cache RAM L2 Cache MMU/MPU ARM Core BIU On-chip SRAM Off-chip Memory D-Cache RAM L1 L2 L3 Typical memory system can have multiple levels of cache Level 1 memory system typically consists of L1-caches, MMU/MPU and TCMs Level 2 memory system (and beyond) depends on the system design Memory attributes determine cache behavior at different levels Controlled by the MMU/MPU (discussed later) Inner Cacheable attributes define memory access behavior in the L1 memory system Outer Cacheable attributes define memory access behavior in the L2 memory system (if external) and beyond (as signals on the bus) Before caches can be used, software setup must be performed Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 49

ARM Cache Features Harvard Implementation for L1 caches Separate Instruction and Data caches Cache Lockdown Prevents line Eviction from a specified Cache Way (discussed later) Pseudo-random and Round-robin replacement strategies Unused lines can be allocated before considering replacement Non-blocking data cache Cache Lookup can hit before a Linefill is complete (also checks Linefill buffer) Streaming, Critical-Word-First Cache data is forwarded to the core as soon as the requested word is received in the Linefill buffer Any word in the cache line can be requested first using a WRAP burst on the bus ECC or parity checking Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 50

Example 32KB ARM Cache Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 51

Cortex-A8 Memory Management Memory Protection Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 52

Memory Allocation Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 53

Memory Management Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 54

Agenda Introduction ARM Processor Overview ARM Architecture Version ARM Processor Pipeline Design ARM7TDMI & ARM9TDMI ARM10 v.s. ARM11 Cortex-A8 ARM v7a Programmer s Model ARM Instruction Set (To be Cont d) Embedded System and Experiment, 102/2, EE/CGU, W.Y. Lin 55