Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Similar documents
The ARM processor. Morgan Kaufman ed Overheads for Computers as Components

ARM Cortex-M4 Programming Model Flow Control Instructions

Writing ARM Assembly. Steven R. Bagley

STEVEN R. BAGLEY ARM: PROCESSING DATA

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Architecture and Instruction Set

MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC

Comparison InstruCtions

ARM Shift Operations. Shift Type 00 - logical left 01 - logical right 10 - arithmetic right 11 - rotate right. Shift Amount 0-31 bits

ARM Assembly Language. Programming

Arm Processor. Arm = Advanced RISC Machines, Ltd.

ARM Processor. Based on Lecture Notes by Marilyn Wolf

Processor Status Register(PSR)

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

Chapter 15. ARM Architecture, Programming and Development Tools

ARM Assembly Programming

ARM Instruction Set. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen

ECE 471 Embedded Systems Lecture 5

ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions

ARM Instruction Set. Introduction. Memory system. ARM programmer model. The ARM processor is easy to program at the

ECE 571 Advanced Microprocessor-Based Design Lecture 3

Basic ARM InstructionS

Cortex M3 Programming

Instruction Set Architecture (ISA)

ARM Cortex M3 Instruction Set Architecture. Gary J. Minden March 29, 2016

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part 2

ARM Assembly Language

The ARM Instruction Set Architecture

Control Flow Instructions

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM

The PAW Architecture Reference Manual

The ARM Cortex-M0 Processor Architecture Part-2

VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS

Outline. ARM Introduction & Instruction Set Architecture. ARM History. ARM s visible registers

ECE 498 Linux Assembly Language Lecture 5

ARM Cortex-M4 Programming Model Memory Addressing Instructions

ARM Cortex-M4 Programming Model Logical and Shift Instructions

Chapter 15. ARM Architecture, Programming and Development Tools

Lecture 15 ARM Processor A RISC Architecture

ARM Cortex-M4 Programming Model Arithmetic Instructions. References: Textbook Chapter 4, Chapter ARM Cortex-M Users Manual, Chapter 3

ECE 471 Embedded Systems Lecture 6

Branch Instructions. R type: Cond

Lecture 4 (part 2): Data Transfer Instructions

Architecture. Digital Computer Design

Flow Control In Assembly

3. The Instruction Set

LAB 1 Using Visual Emulator. Download the emulator Answer to questions (Q)

University of California, San Diego CSE 30 Computer Organization and Systems Programming Winter 2014 Midterm Dr. Diba Mirza

EE251: Tuesday September 5

A Bit of History. Program Mem Data Memory. CPU (Central Processing Unit) I/O (Input/Output) Von Neumann Architecture. CPU (Central Processing Unit)

Control Flow. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 21

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Unsigned and signed integer numbers

The ARM Instruction Set

Introduction to C. Write a main() function that swaps the contents of two integer variables x and y.

3 Assembly Programming (Part 2) Date: 07/09/2016 Name: ID:

EE319K Exam 1 Summer 2014 Page 1. Exam 1. Date: July 9, Printed Name:

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems

ARM Cortex-M4 Architecture and Instruction Set 3: Branching; Data definition and memory access instructions

Exam 1 Fun Times. EE319K Fall 2012 Exam 1A Modified Page 1. Date: October 5, Printed Name:

Raspberry Pi / ARM Assembly. OrgArch / Fall 2018

Assembly Language. CS2253 Owen Kaser, UNBSJ

Exam 1. Date: Oct 4, 2018

EE319K Fall 2013 Exam 1B Modified Page 1. Exam 1. Date: October 3, 2013

Bitwise Instructions

Instruction Set. ARM810 Data Sheet. Open Access - Preliminary

(2) Part a) Registers (e.g., R0, R1, themselves). other Registers do not exists at any address in the memory map

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones

Chapter 3: ARM Instruction Set [Architecture Version: ARMv4T]

Topic 10: Instruction Representation

Introduction to the ARM Processor Using Intel FPGA Toolchain. 1 Introduction. For Quartus Prime 16.1

Introduction to the ARM Processor Using Altera Toolchain. 1 Introduction. For Quartus II 14.0

Typical Processor Execution Cycle

ARM-7 ADDRESSING MODES INSTRUCTION SET

UNIT 2 (ECS-10CS72) VTU Question paper solutions

ARM. Assembly Language and Machine Code. Goal: Blink an LED

F28HS2 Hardware-Software Interfaces. Lecture 6: ARM Assembly Language 1

EE319K Spring 2015 Exam 1 Page 1. Exam 1. Date: Feb 26, 2015

EEM870 Embedded System and Experiment Lecture 4: ARM Instruction Sets

ARM. Assembly Language and Machine Code. Goal: Blink an LED

Overview COMP Microprocessors and Embedded Systems. Lectures 14: Making Decisions in C/Assembly Language II

CMPSCI 201 Fall 2004 Midterm #2 Answers

Introduction to Embedded Systems EE319K (Gerstlauer), Spring 2013

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

ARM7TDMI Instruction Set Reference

CMPSCI 201 Fall 2005 Midterm #2 Solution

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018

Assembler: Basics. Alberto Bosio October 20, Univeristé de Montpellier

Chapter 2. Instructions: Language of the Computer

Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi

EE319K Spring 2016 Exam 1 Solution Page 1. Exam 1. Date: Feb 25, UT EID: Solution Professor (circle): Janapa Reddi, Tiwari, Valvano, Yerraballi

Elements of CPU performance

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Chapters 5. Load & Store. Embedded Systems with ARM Cortex-M. Updated: Thursday, March 1, 2018

ECE 471 Embedded Systems Lecture 9

Systems Architecture The ARM Processor

INSTRUCTION SET ARCHITECTURE

The Assembly Language of the Boz 5

Sneha Rajguru & Prajwal Panchmahalkar

Transcription:

Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University

Outline Instruction Preliminaries ARM Processor SHARC Processor

2.1 Instructions Instructions sets The programmer s interface to the hardware Two CPUs as example ARM processor: ARM version 7 SHARK Digital signal processor (DSP)

2.2 Preliminaries 2.2.1 Computer architecture taxonomy 2.2.2 Assembly language.

2.2.2 Computer Architecture Taxonomy Von Neumann architecture Harvard architectures RISC vs. CISC

Von Neumann architecture A computer whose memory holds both data and instructions The CPU has several internal registers Program counter, general-purpose register CPU fetches instructions by program counter from memory The separation of the instruction memory from the CPU Distinguish a stored-program computer from a general finite-state machine

A von Neumann Architecture Computer address 200 memory ADD r5,r1,r3 data 200 PC CPU IR ADD r5,r1,r3

Harvard Architecture Separate memories for data and program The program counter points to the program memory Hard to write self-modifying programs Used for one very simple reason Provide higher performance for digital signal processing Most of DSPs are Harvard architectures Most of the phone calls go through at least 2 DSPs, one at each end of the phone call

Harvard Architecture address data memory program memory data address data PC CPU

von Neumann vs. Harvard Harvard cannot use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs use Harvard architecture for streaming data: greater memory bandwidth more predictable bandwidth Streaming data Data set the arrive continuously and periodically

RISC vs. CISC Another axis relates to instructions and how they are executed Complex instruction set computer (CISC) A variety of instructions Reduced instruction set computer (RISC) Fewer and simpler instructions Load/store Pipelined processors

Instruction Set Characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands.

Programming Model Programming model: the set of registers available for use by programs Some registers are unavailable to programmer (IR).

2.2.2 Assembly language The textual description of instructions Basic features: One instruction per line Labels Provide names for memory location Start in the first column Instructions often start in later columns To distinguish them from labels Comments run from some designated comment character to the end of the line

An Example of ARM Assembly Language label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment

Pseudo-ops Some assembler directives don t correspond directly to instructions Help programmer create complete language programs For example Define current address Reserve storage Constants

Pseudo-ops (Cont.) Examples In ARM: BIGBLOCK %10 ; allocate a block of 10-byete ; memory and initialize to 0

2.3 ARM Processor ARM is a family of RISC architecture and has been extended over several versions ARM 610, ARM7, ARM9, ARM10, ARM11 ARM 7 Von Neumann architecture machine ARM9 Harvard architecture machine We will concentrate on ARM7

ARM Assembly Language Fairly standard assembly language label LDR r0,[r8] ; a comment ADD r4,r0,r1

ARM Data Type The standard ARM word is 32 bits long Word may be divided into four 8-bit bytes ARM allow addresses to be 32 bits long An address refer to a byte, not a word Word 0 is at location 0 Word 1 is at location 4 PC is incremented by 4 in sequential access Can be configured at power-up to address the bytes in a word in either little-endian or bit-endian mode

Byte Organization Within an ARM Word Little-endian mode The lowest-order byte residing in the low-order bits of the word Big-endian mode The lowest-order byte stored in the highest bits of the word bit 31 bit 0 bit 0 bit 31 byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3 little-endian big-endian

ARM Data Operation ARM is a load-store architecture Arithmetic and logical operations cannot be performed directly on memory locations Data operands must first be loaded into the CPU and then stored back to main memory

ARM Programming Model ARM has 16 general-purpose registers r0~r15 r15: also used as the program counter Allow the program counter value to be used as an operand in computations

ARM Programming Model r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) 31 N Z C V CPSR 0

ARM Programming Model (Cont.) CPSR: Current Program Status Register Set automatically during every arithmetic, logical, or shifting operation Top four bits of the CPSR N: set when the result is negative in two s-complement arithmetic Z: set when every bit of the result is zero C: set when there is a carry out of the operation V: set when an arithmetic operation results in an overflow

ARM Status Bits Examples: -1 + 1 = 0: NZCV = 0110 0xffffffff + 0x1 = 0x0 0-1 = -1: NZCV = 1000 0x0-0x1=0xffffffff

ARM Data Instructions Basic format: ADD r0,r1,r2 Computes r1+r2, stores in r0. Immediate operand: ADD r0,r1,#2 Computes r1+2, stores in r0 Data instructions Arithmetic Logical Shift/store

ARM Arithmetic Instructions ADD : add ADC : add with carry ADC r0, r1, r2; r0 = r1 + r2 + C SUB : subtract SBC : subtract with carry SUB r0, r1, r2; r0 = r1 - r2 + C -1 RSB : reverse subtract RSB r0, r1, r2; r0 = r2 r1 RSC : reverse subtract w. carry RSB r0, r1, r2; r0 = r2 r1 + C -1 MUL : multiply MLA : multiply and accumulate MLA r0, r1, r2, r3; r0 = r1 x r2 + r3

ARM Logical Instructions AND : Bit-wise and ORR : Bit-wise or EOR : Bit-wise exclusive-or BIC : bit clear BIC r0, r1, r2; r0 = r1 & Not(r2)

ARM Shift/Rotate Instructions LSL : logical shift left Fills with zeroes LSR : logical shift right ASL : arithmetic shift left = LSL ASR : arithmetic shift left Copy the sign bit to the most significant bit ROR : rotate right RRX : rotate right extended with C Performs 33-bit rotate, with the CPSR s C bit being inserted above sign bit of the word

ARM Comparison Instructions Do not modify registers but only set the values of the NZCV bits of the CPSR register CMP : compare CMP r0, r1; compute (r0 - r1)and set NZCV CMN : negated compare CMP r0, r1; compute (r0 + r1)and set NZCV TST : bit-wise AND test TST r0, r1; compute (r0 AND r1)and set NZCV TEQ : bit-wise exclusive-or test TEQ r0, r1; compute (r0 EOR r1)and set NZCV

ARM Move Instructions MOV : move MOV r0, r1; r0 = r1 MVN : move (negated) MOV r0, r1; r0 = NOT(r1)

ARM Load/Store instructions LDR : load word LDRH : load half-word LDRB : load byte STR : store word STRH : store half-word STRB : store byte

Addressing Modes Register Immediate Indirect or Register Indirect Base-Plus-Offset Base register r0 r15 Offset, and or subtract an unsigned number Immediate Register (not PC) Scaled register: only available for word and unsigned byte instructions

Register Indirect Addressing LDR r0,[r1] If r1 = 0x100 Set r0 = mem32[0x100] STR r0,[r1] If r1 = 0x100 Set mem32[0x100] = r0

Base-Plus-Offset Addressing Pre-indexing LDR r0,[r1,#4] ; r0:=mem32[r1+4] Offset up to 4K, added or subtracted, (# -4) Post-indexing LDR r0,[r1],#4 ; r0:=mem32[r1], then r1:=r1+4 Auto-indexing LDR r0, [r1,#4]! ; r0:=mem32[r1+4], then r1:=r1+4

ARM ADR pseudo-op Cannot refer to an address directly in an instruction. Generate value by performing arithmetic on PC. ADR provide a pseudo-operation (or pseudoinstruction) to simply the steps ADR r1, F00 ; give location 0x100 the name FOO

Example: C assignments C: x = (a + b) - c; Assembler: ADR r4,a LDR r0,[r4] ADR r4,b LDR r1,[r4] ADD r3,r0,r1 ADR r4,c LDR r2[r4] SUB r3,r3,r2 ADR r4,x STR r3[r4] ; get address for a ; get value of a ; get address for b, reusing r4 ; get value of b ; compute a+b ; get address for c ; get value of c ; complete computation of x ; get address for x ; store value of x

Example: C assignment C: y = a*(b+c); Assembler: ADR r4, b LDR r0, [r4] ADR r4, c LDR r1, [r4] ADD r2, r0,r1 ADR r4, a LDR r0, [r4] MUL r2, r2, r0 ADR r4, y STR r2, [r4] ; get address for b ; get value of b ; get address for c ; get value of c ; compute partial result ; get address for a ; get value of a ; compute final value for y ; get address for y ; store y

Example: C assignment C: z = (a << 2) (b & 15); Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,lsl 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR ADR r4,z ; get address for z STR r1,[r4] ; store value for z

ARM Flow of Control Branch instruction: PC-relative B #100 ; add 400 to the current PC value B LABEL ; branch a LABEL Conditional branch by testing CPSR value EQ : equal (Z = 1) NE : not equal (Z = 0) CS : carry set ( C = 1) CC : carry clear (C = 0)

ARM Flow of Control (Cont.) MI : minus ( N = 1) PL : nonnegative ( N = 0) VS : overflow ( V = 1) VC : no overflow ( V = 0) HI : unsigned higher ( C = 1 and Z = 0) LS : unsigned lower or same ( C = 0 or Z = 1) GE : greater than or equal ( N = V) LT : signed less than ( N!= V) GT : signed greater than ( Z = 0 and N = V) LE : signed less than or equal ( Z = 1 and N!= V)

Example: if statement C: if (a > b) { x = 5; y = c + d; } else x = c - d; Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block

If statement, cont d. ; true block MOV r0,#5 ADR r4,x STR r0,[r4] ADR r4,c LDR r0,[r4] ADR r4,d LDR r1,[r4] ADD r0,r0,r1 ADR r4,y STR r0,[r4] B after ; generate value for x ; get address for x ; store x ; get address for c ; get value of c ; get address for d ; get value of d ; compute y ; get address for y ; store y ; branch around false block

If statement, cont d. ; false block fblock ADR r4,c LDR r0,[r4] ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ADR r4,x STR r0,[r4] after... ; get address for c ; get value of c ; get address for d ; get value for d ; compute a-b ; get address for x ; store value of x

Example: Conditional instruction implementation ; true block MOVLT r0,#5 ADRLT r4,x STRLT r0,[r4] ADRLT r4,c LDRLT r0,[r4] ADRLT r4,d LDRLT r1,[r4] ADDLT r0,r0,r1 ADRLT r4,y STRLT r0,[r4] ; generate value for x ; get address for x ; store x ; get address for c ; get value of c ; get address for d ; get value of d ; compute y ; get address for y ; store y

Conditional instruction implementation, cont d. ; false block ADRGE r4,c LDRGE r0,[r4] ADRGE r4,d LDRGE r1,[r4] SUBGE r0,r0,r1 ADRGE r4,x STRGE r0,[r4] ; get address for c ; get value of c ; get address for d ; get value for d ; compute a-b ; get address for x ; store value of x

Example: switch statement C: switch (test) { case 0: break; case 1: } Assembler: ADR r2,test LDR r0,[r2] ADR r1,switchtab LDR r1,[r1,r0,lsl #2] switchtab DCD case0 DCD case1... ; get address for test ; load value for test ; load address for switch table ; index switch table

Example: FIR filter C: for (i=0, f=0; i<n; i++) f = f + c[i]*x[i]; Assembler ; loop initiation code MOV r0,#0 ; use r0 for I MOV r8,#0 ; use separate index for arrays ADR r2,n ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f

FIR filter, cont.d ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue

ARM Subroutine Linkage Branch and link instruction: BL foo Save the current PC value to r14 And then jump to foo To return from subroutine: MOV r15,r14

Summary Load/store architecture Most instructions are RISCy, operate in single cycle. Some multi-register operations take longer. All instructions can be executed conditionally.