ARM Assembly Programming

Similar documents
ARM Assembly Language

ARM Instruction Set. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen

ARM Instruction Set. Introduction. Memory system. ARM programmer model. The ARM processor is easy to program at the

ARM Assembly Language. Programming

Outline. ARM Introduction & Instruction Set Architecture. ARM History. ARM s visible registers

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM-7 ADDRESSING MODES INSTRUCTION SET

MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC

ECE 471 Embedded Systems Lecture 5

VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part 2

ARM Instruction Set Dr. N. Mathivanan,

ARM Architecture and Instruction Set

Multiple data transfer instructions

The ARM Instruction Set

Processor Status Register(PSR)

Chapter 3: ARM Instruction Set [Architecture Version: ARMv4T]

Chapter 15. ARM Architecture, Programming and Development Tools

ARM Assembly Programming II

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Chapters 5. Load & Store. Embedded Systems with ARM Cortex-M. Updated: Thursday, March 1, 2018

The ARM Instruction Set Architecture

The Original Instruction Pipeline

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

ARM Assembly Programming

ECE 498 Linux Assembly Language Lecture 5

Cortex M3 Programming

ARM7TDMI Instruction Set Reference

Architecture. Digital Computer Design

ARM Assembly Programming

ARM Instruction Set 1

18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM

Writing ARM Assembly. Steven R. Bagley

ARM Evaluation System

Chapter 15. ARM Architecture, Programming and Development Tools

STEVEN R. BAGLEY ARM: PROCESSING DATA

ECE 571 Advanced Microprocessor-Based Design Lecture 3

Introduction to the ARM Processor Using Intel FPGA Toolchain. 1 Introduction. For Quartus Prime 16.1

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems

Introduction to the ARM Processor Using Altera Toolchain. 1 Introduction. For Quartus II 14.0

ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions

The ARM Cortex-M0 Processor Architecture Part-2

ARM Cortex M3 Instruction Set Architecture. Gary J. Minden March 29, 2016

LAB 1 Using Visual Emulator. Download the emulator Answer to questions (Q)

University of California, San Diego CSE 30 Computer Organization and Systems Programming Winter 2014 Midterm Dr. Diba Mirza

The ARM processor. Morgan Kaufman ed Overheads for Computers as Components

Developing StrongARM/Linux shellcode

EE251: Tuesday September 5

ECE 471 Embedded Systems Lecture 6

Optimizing ARM Assembly

Bitwise Instructions

ECE 471 Embedded Systems Lecture 9

Spring 2012 Prof. Hyesoon Kim

Lecture 15 ARM Processor A RISC Architecture

CprE 288 Introduction to Embedded Systems Course Review for Exam 3. Instructors: Dr. Phillip Jones

(2) Part a) Registers (e.g., R0, R1, themselves). other Registers do not exists at any address in the memory map

ARM Cortex-M4 Programming Model Memory Addressing Instructions

Assembly ARM. Execution flow MIPS. Calcolatori Elettronici e Sistemi Operativi. Examples MAX2 MAX3. return maximum between 2 integers

Systems Architecture The Stack and Subroutines

Lecture 4 (part 2): Data Transfer Instructions

EE319K Spring 2016 Exam 1 Solution Page 1. Exam 1. Date: Feb 25, UT EID: Solution Professor (circle): Janapa Reddi, Tiwari, Valvano, Yerraballi

EE319K Fall 2013 Exam 1B Modified Page 1. Exam 1. Date: October 3, 2013

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Comparison InstruCtions

Exam 1 Fun Times. EE319K Fall 2012 Exam 1A Modified Page 1. Date: October 5, Printed Name:

EE319K Exam 1 Summer 2014 Page 1. Exam 1. Date: July 9, Printed Name:

Exam 1. Date: Oct 4, 2018

Instruction Set. ARM810 Data Sheet. Open Access - Preliminary

ARM Cortex-M4 Programming Model Flow Control Instructions

ARM Processor. Based on Lecture Notes by Marilyn Wolf

3. The Instruction Set

Arm Processor. Arm = Advanced RISC Machines, Ltd.

The ARM Architecture T H E A R C H I T E C T U R E F O R TM T H E D I G I T A L W O R L D

EE319K Spring 2015 Exam 1 Page 1. Exam 1. Date: Feb 26, 2015

ARM Cortex-M4 Programming Model Logical and Shift Instructions

Cortex-M3 Instruction Set

Introduction to C. Write a main() function that swaps the contents of two integer variables x and y.

Instruction Set Architecture (ISA)

Introduction to Embedded Systems EE319K (Gerstlauer), Spring 2013

NET3001. Advanced Assembly

EEM870 Embedded System and Experiment Lecture 4: ARM Instruction Sets

A Bit of History. Program Mem Data Memory. CPU (Central Processing Unit) I/O (Input/Output) Von Neumann Architecture. CPU (Central Processing Unit)

ARM Processor Core and Instruction Sets Prof. Tian-Sheuan Chang

Problem Sheet 1: ANSWERS (NOT ) (NOT ) 3. 37, ,

3 Assembly Programming (Part 2) Date: 07/09/2016 Name: ID:

Topic 10: Instruction Representation

Basic ARM InstructionS

The PAW Architecture Reference Manual

Embedded Systems Ch 12B ARM Assembly Language

The ARM Architecture

CprE 288 Introduction to Embedded Systems ARM Assembly Programming: Translating C Control Statements and Function Calls

Assembly Language Programming

Chapter 2. ARM Processor Core and Instruction Sets. C. W. Jen 任建葳.

Chapter 2. Instructions: Language of the Computer

Systems Architecture The ARM Processor

ECE 471 Embedded Systems Lecture 7

ARM Shift Operations. Shift Type 00 - logical left 01 - logical right 10 - arithmetic right 11 - rotate right. Shift Amount 0-31 bits

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins

Assembly Language. CS2253 Owen Kaser, UNBSJ

Transcription:

Introduction ARM Assembly Programming The ARM processor is very easy to program at the assembly level. (It is a RISC) We will learn ARM assembly programming at the user level and run it on a GBA emulator. Computer Organization and Assembly Languages Yung-Yu Chuang 27/11/19 with slides by Peng-Sheng Chen ARM programmer model Memory system The state of an ARM system is determined by the content of visible registers and memory. A user-mode program can see 15 32-bit generalpurpose registers (-4), program counter (PC) and CPSR. Instruction set defines the operations that can change the state. Memory is a linear array of bytes addressed from 0 to 2 32-1 Word, half-word, byte Little-endian 0x 0x01 0x02 0x03 0x04 0x05 0x06 10 20 30 0xFD 0xFE 0x

Byte ordering ARM programmer model Big Endian Least significant byte has highest address Word address 0x Value: 102030 Little Endian Least significant byte has lowest address Word address 0x Value: 302010 0x 0x01 0x02 0x03 0x04 0x05 0x06 10 20 30 R4 R8 2 R5 R9 3 R2 R6 0 4 R3 R7 1 PC 0x 0x01 0x02 0x03 0x04 0x05 0x06 10 20 30 0xFD 0xFE 0xFD 0xFE 0x 0x Instruction set ARM instructions are all 32-bit long (except for Thumb mode). There are 2 32 possible machine instructions. Fortunately, they are structured. Features of ARM instruction set Load-store architecture 3-address instructions Conditional execution of every instruction Possible to load/store multiple register at once Possible to combine shift and ALU operations in a single instruction

Instruction set MOV<cc><S> Rd, <operands> MOVCS, @ if carry is set @ then := Instruction set Data processing (Arithmetic and Logical) Data movement Flow control MOVS, #0 @ :=0 @ Z=1, N=0 @ C, V unaffected Data processing Arithmetic and logic operations General rules: All operands are 32-bit, coming from registers or literals. The result, if any, is 32-bit and placed in a register (with the exception for long multiply which produces a 64-bit result) 3-address format Arithmetic ADD,, R2 ADC,, R2 SUB,, R2 SBC,, R2 RSB,, R2 RSC,, R2 @ = +R2 @ = +R2+C @ = -R2 @ = -R2+C-1 @ = R2- @ = R2-+C-1

Bitwise logic AND,, R2 ORR,, R2 EOR,, R2 BIC,, R2 @ = and R2 @ = or R2 @ = xor R2 @ = and (~R2) Register movement MOV, R2 @ = R2 MVN, R2 @ = ~R2 move negated bit clear: R2 is a mask identifying which bits of will be cleared to zero =0x11111111 R2=0x011101 BIC,, R2 =0x111010 Comparison These instructions do not generate a result, but set condition code bits (N, Z, C, V) in CPSR. Often, a branch operation follows to change the program flow. compare CMP, R2 compare negated CMN, R2 bit test TST, R2 test equal TEQ, R2 @ set cc on -R2 @ set cc on +R2 @ set cc on and R2 @ set cc on xor R2 Addressing modes Register operands ADD,, R2 Immediate operands a literal; most can be represented by (0..255)x2 2n 0<n<12 ADD R3, R3, #1 @ R3:=R3+1 AND R8, R7, #0xff @ R8=R7[7:0] a hexadecimal literal This is assembler dependent syntax.

Shifted register operands One operand to ALU is routed through the Barrel shifter. Thus, the operand can be modified before it is used. Useful for dealing with lists, table and other complex data structure. (similar to the displacement addressing mode in CISC.) Logical shift left C register 0 MOV, R2, LSL #2 @ :=R2<<2 @ R2 unchanged Example: 00 11 Before R2=0x30 After =0xC0 R2=0x30 Logical shift right Arithmetic shift right 0 register C MSB register C MOV, R2, LSR #2 @ :=R2>>2 @ R2 unchanged Example: 00 11 Before R2=0x30 After =0x0C R2=0x30 MOV, R2, ASR #2 @ :=R2>>2 @ R2 unchanged Example: 1010 00 11 Before R2=0xA030 After =0xE80C R2=0xA030

Rotate right Rotate right extended register C register C MOV, R2, ROR #2 @ :=R2 rotate @ R2 unchanged Example: 00 11 01 Before R2=0x31 After =0x4C R2=0x31 MOV, R2, RRX @ :=R2 rotate @ R2 unchanged Example: 00 11 01 Before R2=0x31, C=1 After =0x8018, C=1 R2=0x31 Shifted register operands Shifted register operands

Shifted register operands It is possible to use a register to specify the number of bits to be shifted; only the bottom 8 bits of the register are significant. Setting the condition codes Any data processing instruction can set the condition codes if the programmers wish it to ADD,, R2, LSL R3 @ :=+R2*2 R3 64-bit addition ADDS R2, R2, ADC R3, R3, + R3 R2 R3 R2 Multiplication MUL,, R2 @ = (xr2) [31:0] Features: Second operand can t be immediate The result register must be different from the first operand If S bit is set, C flag is meaningless See the reference manual (4.1.33) Multiplication Multiply-accumulate MLA R4, R3, R2, @ R4 = R3xR2+ Multiply with a constant can often be more efficiently implemented using shifted register operand MOV, #35 MUL R2,, or ADD,,, LSL #2 @ =5x RSB R2,,, LSL #3 @ R2 =7x

Data transfer instructions Move data between registers and memory Three basic forms Single register load/store Multiple register load/store Single register swap: SWP(B), atomic instruction for semaphore Single register load/store The data items can be a 8-bitbyte, 16-bit halfword or 32-bit word. LDR, [] STR, [] @ := mem 32 [] @ mem 32 [] := LDR, LDRH, LDRB for 32, 16, 8 bits STR, STRH, STRB for 32, 16, 8 bits Load an address into a register The pseudo instruction ADR loads a register with an address table:.word 10 ADR, table Assembler transfer pseudo instruction into a sequence of appropriate instructions sub r0, pc, #12 Addressing modes Memory is addressed by a register and an offset. LDR, [] @ mem[] Three ways to specify offsets: Constant LDR, [, #4] @ mem[+4] Register LDR, [, R2] @ mem[+r2] Scaled @ mem[+4*r2] LDR, [, R2, LSL #2]

Addressing modes Pre-indexed addressing (LDR, [, #4]) without a writeback Auto-indexing addressing (LDR, [, #4]!) calculation before accessing with a writeback Post-indexed addressing (LDR, [], #4) calculation after accessing with a writeback Pre-indexed addressing LDR, [, #4] @ =mem[+4] @ unchanged LDR, [, ] + Auto-indexing addressing Post-indexed addressing LDR, [, #4]! @ =mem[+4] @ =+4 No extra time; Fast; LDR,, #4 @ =mem[] @ =+4 LDR, [, ]! LDR,[], + +

Comparisons Pre-indexed addressing LDR, [, R2] @ =mem[+r2] @ unchanged Auto-indexing addressing LDR, [, R2]! @ =mem[+r2] @ =+R2 Post-indexed addressing LDR, [], R2 @ =mem[] @ =+R2 Application loop: ADR, table table LDR, [] ADD,, #4 @ operations on ADR, table loop: LDR, [], #4 @ operations on Multiple register load/store Transfer large quantities of data more efficiently. Used for procedure entry and exit for saving and restoring workspace registers and the return address registers are arranged an in increasing order; see manual LDMIA, {, R2, R5} @ = mem[] @ R2 = mem[r1+4] @ R5 = mem[r1+8] Multiple load/store register LDM load multiple registers STM store multiple registers suffix meaning IA increase after IB increase before DA decrease after DB decrease before

Multiple load/store register Multiple load/store register LDM<mode> Rn, {<registers>} IA: addr:=rn IB: addr:=rn+4 DA: addr:=rn-#<registers>*4+4 DB: addr:=rn-#<registers>*4 For each Ri in <registers> IB: addr:=addr+4 DB: addr:=addr-4 Ri:=M[addr] IA: addr:=addr+4 DA: addr:=addr-4 <!>: Rn:=addr Rn R2 LDM<mode> Rn, {<registers>} IA: addr:=rn IB: addr:=rn+4 DA: addr:=rn-#<registers>*4+4 DB: addr:=rn-#<registers>*4 For each Ri in <registers> IB: addr:=addr+4 DB: addr:=addr-4 Ri:=M[addr] IA: addr:=addr+4 DA: addr:=addr-4 <!>: Rn:=addr Rn R3 R2 R3 Multiple load/store register Multiple load/store register LDM<mode> Rn, {<registers>} IA: addr:=rn IB: addr:=rn+4 DA: addr:=rn-#<registers>*4+4 DB: addr:=rn-#<registers>*4 For each Ri in <registers> IB: addr:=addr+4 DB: addr:=addr-4 Ri:=M[addr] IA: addr:=addr+4 DA: addr:=addr-4 <!>: Rn:=addr Rn R2 R3 LDM<mode> Rn, {<registers>} IA: addr:=rn IB: addr:=rn+4 DA: addr:=rn-#<registers>*4+4 DB: addr:=rn-#<registers>*4 For each Ri in <registers> IB: addr:=addr+4 DB: addr:=addr-4 Ri:=M[addr] IA: addr:=addr+4 DA: addr:=addr-4 <!>: Rn:=addr Rn R2 R3

Multiple load/store register LDMIA, {,R2,R3} or LDMIA, {-R3} Multiple load/store register LDMIA!, {,R2,R3} addr data addr data : 10 R2: 20 R3: 30 : 0x10 0x010 0x014 0x018 0x01C 10 20 30 40 : 10 R2: 20 R3: 30 : 0x01C 0x010 0x014 0x018 0x01C 10 20 30 40 0x020 50 0x020 50 0x024 60 0x024 60 Multiple load/store register LDMIB!, {,R2,R3} Multiple load/store register LDMDA!, {,R2,R3} addr data addr data : 20 R2: 30 R3: 40 : 0x01C 0x010 0x014 0x018 0x01C 10 20 30 40 : 40 R2: 50 R3: 60 : 0x018 0x010 0x014 0x018 0x01C 10 20 30 40 0x020 50 0x020 50 0x024 60 0x024 60

Multiple load/store register Application LDMDB!, {,R2,R3} Copy a block of memory addr data R9: address of the source 0: address of the destination 1: end address of the source : 30 R2: 40 R3: 50 : 0x018 0x010 0x014 0x018 0x01C 10 20 30 40 loop: LDMIA R9!, {-R7} STMIA 0!, {-R7} CMP R9, 1 BNE loop 0x020 50 0x024 60 Application Stack (full: pointing to the last used; ascending: grow towards increasing memory addresses) mode Full ascending (FA) Full descending (FD) Empty ascending (EA) Empty descending (ED) LDMFD 3!, {R2-R9} @ modify R2-R9 STMFD 3!, {R2-R9} POP LDMFA LDMFD LDMEA LDMED =LDM LDMDA LDMIA LDMDB LDMIB PUSH STMFA STMFD STMEA STMED =STM STMIB STMDB STMIA STMDA Control flow instructions Determine the instruction to be executed next Branch instruction B label label: Conditional branches MOV, #0 loop: ADD,, #1 CMP, #10 BNE loop

Branch conditions Branch and link BL instruction save the return address to 4 (lr) BL sub @ call sub CMP, #5 @ return to here MOVEQ, #0 sub: @ sub entry point MOV PC, LR @ return Branch and link BL sub1 @ call sub1 use stack to save/restore the return address and registers sub1: STMFD 3!, {-R2,4} BL sub2 LDMFD 3!, {-R2,PC} Conditional execution Almost all ARM instructions have a condition field which allows it to be executed conditionally. movcs, sub2: MOV PC, LR

Conditional execution bypass: CMP, #5 BEQ bypass @ if (!=5) { ADD,, @ =+-R2 SUB,, R2 @ } CMP, #5 smaller and faster ADDNE,, SUBNE,, R2 Rule of thumb: if the conditional sequence is three instructions or less, it is better to use conditional execution than a branch. Conditional execution skip: if ((==) && (R2==R3)) R4++ CMP, BNE skip CMP R2, R3 BNE skip ADD R4, R4, #1 CMP, CMPEQ R2, R3 ADDEQ R4, R4, #1 Instruction set ARM assembly program label operation operand comments main: LDR, value @ load value STR, result SWI #11 value:.word 0xC123 result:.word 0

Shift left one bit ADR, value MOV,, LSL #0x1 STR, result SWI #11 value:.word 4242 result:.word 0 Add two numbers main: ADR, value1 ADR R2, value2 ADD,, R2 STR, result SWI #11 value1:.word 0x01 value2:.word 0x02 result:.word 0 64-bit addition Loops ADR, value1 LDR, [] LDR R2, [, #4] ADR, value2 LDR R3, [] LDR R4, [, #4] ADDS R6, R2, R4 ADC R5,, R3 STR R5, [] STR R6, [, #4] SWI #11 value1:.word 0x01, 0xF0 value2:.word 0x, 0x10 result:.word 0 01F0 + 10 02 C R2 + R3 R4 R5 R6 For loops for (i-0; i<10; i++) {a[i]=0;} MOV, #0 ADR R2, A MOV, #0 LOOP: CMP, #10 BGE EXIT STR, [R2,, LSL #2] ADD,, #1 B LOOP EXIT:..

Loops While loops LOOP: ; evaluate expression BEQ EXIT ; loop body B LOOP EXIT: Find larger of two numbers ADR, value1 ADR R2, value2 CMP, R2 BHI Done MOV, R2 Done: STR, result SWI #11 value1:.word 4 value2:.word 9 result:.word 0 GCD int gcd (int I, int j) { while (i!=j) { if (i>j) i -= j; else j -= i; } } GCD Loop: CMP, R2 SUBGT,, R2 SUBLT R2, R2, BNE loop

Count negatives ; count the number of negatives in ; an array DATA of length LENGTH ADR, DATA @ addr EOR,, @ count LDR R2, Length @ R2 index CMP R2, #0 BEQ Done Count negatives loop: LDR R3, [] CMP R3, #0 BPL looptest ADD,, #1 @ it s neg. looptest: ADD,, #4 SUBS R2, R2, #1 BNE loop Subroutines Passing parameters using stacks Passing parameters in registers Caller Assume that we have three parameters BufferLen, BufferA, BufferB to pass into a subroutine ADR, BufferLen ADR, BufferA ADR R2, BufferB BL Subr MOV, #BufferLen STR, [SP, #-4]! MOV, =BufferA STR, [SP, #-4]! MOV, =BufferB STR, [SP, #-4]! BL Subr SP BufferB BufferA BufferLen

Passing parameters using stacks Passing parameters using stacks Callee Subr STMDB SP, {,,R2,3,4} LDR R2, [SP, #0] LDR, [SP, #4] LDR, [SP, #8] R2 3 4 Callee Subr STMDB SP, {,,R2,3,4} LDR R2, [SP, #0] LDR, [SP, #4] LDR, [SP, #8] R2 3 4 SP LDMDB SP, {,,R2,3,4} MOV PC, LR BufferB BufferA LDMDB SP, {,,R2,3,PC} SP BufferB BufferA BufferLen BufferLen Review ARM programmer model ARM architecture ARM programmer model ARM instruction set ARM assembly programming R4 R5 R2 R6 R3 R7 0x 0x01 0x02 0x03 10 20 30 R8 2 R9 3 0 4 1 PC 0x04 0x05 0x06 0xFD 0xFE 0x

Instruction set References ARM Limited. ARM Architecture Reference Manual. Peter Knaggs and Stephen Welsh, ARM: Assembly Language Programming. Peter Cockerell, ARM Assembly Language Programming. Peng-Sheng Chen, Embedded System Software Design and Implementation.