Module 3 Instruction Set Architecture (ISA)

Module 3 Instruction Set Architecture (ISA) I S A L E V E L E L E M E N T S O F I N S T R U C T I O N S I N S T R U C T I O N S T Y P E S N U M B E R O F A D D R E S S E S R E G I S T E R S T Y P E S O F O P E R A N D S R E F E R E N C E : W I L L I A M S T A L L I N G S C O M P U T E R O R G A N I Z A T I O N & A R C H I T E C T U R E K I P I R V I N E A S S E M B L Y L A N G U A G E F O R I N T E L - B A S E D C O M P U T E R S

Instruction Set Architecture (ISA) Level ISA Level defines the interface between the compilers (high level language) and the hardware. It is the language that both them understand

What is an Instruction Set? The complete collection of instructions that are understood by a CPU Known also as Machine Code/Machine Instruction Binary representation Usually represented by assembly codes User becomes aware of registers, memory structure, data types supported by machine and the functioning of ALU

Elements of an Instruction Operation code (Opcode) Specifies the operation to be performed (ADD, SUB etc). Specified as binary code know as OPCODE Opcode Source operand Source Operand reference One or more source operands (input for the operation) Result (Destination) Operand reference Operation produce a result (output for the operation) Sometimes the result is an action, like JMP target MOV AX, BX Destination Operand Next Instruction Reference Tells processor where to fetch the next instruction after the execution of current instruction is completed

Elements of an Instruction Source and result operands could be: Main memory or virtual memory addresses is supplied for instruction references CPU registers (processor registers) One or more registers that can be referenced by instructions Immediate the value of the operand is contained in the field in the instruction executed. I/O device instruction specifies the I/O module and device for the operation

Elements of an Instruction Go to the address location that holds TOTAL and get the value Operand: memory Operand: register Operand: immediate value Operand: from I/O Next instruction is where TARGET is located = 0003

Instruction Representation In machine code: each instruction has a unique bit pattern ( a sequence of bits) Instruction divided into fields and with multiple formats During instruction execution: An instruction is read into the Instruction Register (IR) in the processor The processor then extract the data and perform the required operation What the processor see What the programmer see

Instruction Representation For better understanding, a symbolic representation is used Opcodes represented as mnemonics, indicates the operations e.g. ADD, SUB, LOAD Difficult to deal in binary representation of machine instructions Operands can also be represented symbolically Add ADD AX, Total The value contained in data location Total To the contents of register AX Put the result of the addition into register AX

Instruction Types A single instructions in a High level language like C may require more than 1 instruction in Assembly language. Example : Total = Total + stuff ; add the value stored in Total to the value stored in stuff and put result in Total. In assembly language (assuming Total and stuff has been declared): Load a register with the contents of memory (for Total) Add the contents of memory (for stuff) to the register Store the content of the register to memory location (for Total)

Instruction Types: Categories Data processing Arithmetic and logic instructions Data storage (main memory) Memory instructions movement of data into or out of memory locations Data movement (I/O) I/O instructions Control (Program flow control) Test and branch instructions

Number of Addresses Number of addresses per instructions is one way to describe processor architecture Number of addresses refers to how many operand can an instruction take. SUB Y,B 2-address instruction SUB Y,A,B 3-address instruction The more the addresses fewer number of instructions needed The more the addresses will require a longer instruction format The more the addresses the slower the fetch and execution The more the addresses will require a more complex processor With multiple-address instructions, there are commonly multiple general registers that can be used Register references are faster than memory references Design trade-offs : choosing the number of addresses per instruction

2 addresses 6 instructions 1 address 8 instructions PUSH C PUSH D PUSH E MUL ADD PUSH B PUSH A SUB DIV POP Y 3 addresses 4 instructions 0 address 10 instructions

Instruction Set Design Decisions Operation repertoire How many ops? What can they do? How complex are they? Data types The various types of data Instruction formats Length of op code field Number of addresses Registers Number of CPU registers available Which operations can be performed on which registers? Addressing modes The mode(s) by which the address of an operand is specified RISC v CISC

Registers (32-bit) General purpose registers primarily used for arithmetic and data movement EAX EBX ECX EDX EBP ESP ESI EDI EAX Automatically used by MUL and DIV instructions base pointer register stack pointer register source index register destination index register ECX Automatically used by processor as a loop counter

...Registers (16-bit) eax ebx ecx edx esi edi 31 1615 0 ax bx cx dx si di The least significant 16-bits of these registers have an additional register name that can be used for accessing just those 16- bits. esp ebp sp bp

eax ebx ecx edx Registers (8-bit) 21 31 1615 0 15 8 7 0 ax bx cx dx ah bh ch dh al bl cl dl The 2 least significant bytes of registers eax, ebx, ecx and edx also have register names, that can be used for accessing 8 bits. Pentium Registers & Addressing Modes K.K. Leung Fall 2008

Instruction Pointer Register The instruction pointer register (EIP) holds the address of the next instruction to be executed. The EIP register corresponds to the program counter register in other architectures. EIP can be manipulated for certain instructions (e.g. call, jmp, ret) to branch to a new location

Flags Register Earlier version (8086/8088) flags were 16 bits. Later versions flags are 32 bits EFLAGS. Flags come in 2 types: Conditional or status flags set or reset set or reset by the Execution unit (EU) on the basis of the results of some arithmetic operation. Machine control flags used to control certain operations of processor. The EFLAGS register consists of individual binary bits that control CPU operation or reflect outcome of some CPU operation. Some instruction test and manipulate individual processor flags. Example : STC Set carry flag, JNZ Jump Not Zero

Flags Register : Status flags carry flag (CF)- indicates a carry after addition or a borrow after subtraction, also indicates error conditions. parity flag (PF)- is a logic 0 for odd parity and a logic 1 for even parity. auxiliary carry flag (AF)- important for BCD addition and subtraction; holds a carry (borrow) after addition (subtraction) between bits position 3 and 4. BCD is not used much anymore zero flag (ZF)- indicates that the result of an arithmetic or logic operation is zero. sign flag (SF)- indicates arithmetic sign of the result after an arithmetic operation. overflow flag (OF)- a condition that occurs when signed numbers are added or subtracted. An overflow indicates that the result has exceeded the capacity of the machine.

Flags Register : Control flags The control flags are deliberately set or reset with specific Extra knowledge you instructions YOU put in your program. trap flag (TF) - used for single stepping through a program; interrupt flag (IF) - used to allow or prohibit the interruption of a program; don t necessarily use this direction flag (DF) - used with string instructions.

Types of Operand Numbers numeric data Integer/floating point/decimal Limited magnitude of numbers integer/decimal Limit precision floating point Characters data for text and strings ASCII, UNICODE etc. Logical Data Bits or flags

Example: Types of Operand for Pentium 4

Module 3: Part B M O D U L E 3 I N S T R U C T I O N SET ARCHITECTURE ( I S A ) : A D D R E S S I N G M O D E S I N S T R U C T I O N F O R M A T S

Addressing Modes Addressing reference a location in main memory/virtual memory Immediate Direct Indirect Register Register Indirect Displacement (Indexed)

Immediate Addressing Operand is part of instruction Operand = A Used to: define and use constant Set initial values of variables No memory reference to fetch data so it is FAST Opcode Operand ADD Size of the operand is limited to the size of address field AX, 5H Immediate value

Direct Addressing Address field contains address of operand Effective Address (EA) = address field (A) EA will be either a virtual memory (if present), main memory address or a register ADD AX, count e.g. ADD EAX, A Add contents of cell A to register EAX Look in memory at address A for operand ADD AX, (1011) Single memory reference to access data No additional calculations to work out effective address Limited address space

Direct Addressing.data val1 byte 10h array1 word 2210h, 11h, 12h, 13h array2 dword 123h, 234h, 345h, 456h.code main PROC mov al, val1 mov bx, array1 mov ecx, array2 call dumpregs al = 10h bx = 2210h ecx = 00000123h exit main ENDP

Register Addressing Similar to direct addressing The address field refers to a register EA = R Limited number of registers compared to memory locations Very small address field needed Shorter instructions Faster instruction fetch No time consuming memory references needed faster Very limited address space Multiple registers helps performance R=contents of an address field in instruction that refers to a register Requires good assembly programming or compiler writing

Register Addressing.code main PROC mov eax,0 mov ebx,2000h mov ecx,3000h mov eax, ebx add eax, ecx eax = 00002000h eax = 00005000h exit main ENDP call dumpregs

Indirect Addressing Memory cell pointed to by address field contains the address of (pointer to) the operand EA = (A) or EA = [A] Look in A, find address (A) and look there for operand e.g. ADD EAX,(A) or ADD EAX,[A] Add contents of cell pointed to by contents of A to register EAX Indirect operands are ideal for traversing an array. **Note that the register in brackets must be incremented by a value that matches the array type 1 for byte, 2 - word, 4 - dword.

Indirect Addressing Large address space 2 n where n = word length May be nested, multilevel, cascaded e.g. EA = (((A))) or EA = [[[A]]] Draw the diagram yourself Multiple memory accesses to find operand slower

Indirect Addressing.data array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, Move 456h the content of array3 dword 123456h, 23456789h the memory where the first byte of array1 is kept into bl.code main PROC mov bl, [array1] mov cx, [array2] mov edx, [array3] mov ax, [array2 + 2] call dumpregs exit main ENDP bl = 10h cx = 0123h ax = 0234h array1 Memory address 00404004 00404005 00404006 00404007 Memory content 10 11 12 13

Register Indirect Addressing Similar to indirect addressing EA = (R) or EA = [R] Operand is in memory cell pointed to by contents of register R Large address space (2 n ) One fewer memory access than indirect addressing

Register Indirect Addressing.data array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, 456h array3 dword 123456h, 23456789h.code main PROC mov esi, OFFSET array1 mov edi, OFFSET array2 mov bl, [esi] mov cx, [edi] mov edx, (esi) mov ax, [edi + 2] call dumpregs exit main ENDP The address that holds the first byte of array1 is stored into register esi must be 32 bit register Read the register esi, go to memory address, get the value, store in bl. Read the register edi, add a word (2 bytes), go to memory address shown, get the value get the second element of array2 esi = 00404004 Register indirect

Register Indirect Addressing.data array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, 456h array3 dword 123456h, 23456789h.code main PROC mov esi, OFFSET array1 mov edi, OFFSET array2 mov bl, [esi] bl = 10h mov cx, [edi] cx = 0123h mov edx, (esi) mov al, [esi + 1] al = 11h call dumpregs exit main ENDP The address that holds array1 is stored into register esi esi = 00404004 array1 Memory address 00404004 00404005 00404006 00404007 Memory content 10 11 12 13

Register Indirect Addressing.data array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, 456h array3 dword 123456h, 23456789h.code main PROC mov esi, OFFSET array1 mov edi, OFFSET array2 mov bl, [esi] mov cx, [edi] mov edx, (esi) mov al, [esi + 1] MOV DX, [ESI] ; DX = 1110H exit main ENDP call dumpregs array1 Because DX is a word size, then 2 bytes will be put in. Memory address 00404004 00404005 00404006 00404007 Memory content 10 11 12 13

Register Indirect Addressing.code main PROC mov ebx,404000h mov dl, [ebx] inc ebx mov cl, [ebx] call dumpregs exit main ENDP dl = 24h cl = 55h Memory address 00404000 00404001 00404002 Memory content 24 55 66

Displacement Addressing Combines direct addressing and register indirect addressing EA = A + (R) or EA = A + [R] Address field hold two values A = base value R = register that holds displacement or vice versa 3 common displacement addressing technique: Relative addressing Base register addressing Indexing

Displacement Addressing Diagram Opcode Register R Instruction Address A Memory Registers Pointer to Operand + Operand

Relative Addressing A version of displacement addressing R = Program counter, PC EA = A + (PC) or EA = A + [PC] The next instruction address (shown in PC) is added to the address field to produce the EA

00000014 B9 00000000 mov ecx,0 00000019 BA 00000000 mov edx,0 0000001E B8 00000000 MOV EAX,0 00000023 B9 00000004 MOV ECX,4 00000028 L1: 00000028 66 8B 98 MOV BX, ARRAY2[EAX] 00000008 R 0000002F 83 C0 02 ADD EAX,2 00000032 E8 00000000 E call dumpregs 00000037 E2 EF LOOP L1 PC = 39 Need to go to L1 which is in 0028 exit 00000039 6A 00 * push +000000000h Destination = target source how much to jump = where is L1 PC how much to jump = 28 39 = EF -ve because its jumping backwards

Base-Register Addressing EA = A + R A holds displacement R holds pointer to base address R may be explicit or implicit

Indexed Addressing A = base R = displacement EA = A + R Good for accessing arrays

INDEXED ADDRESSING.data array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, 456h array3 dword 123456h, 23456789h.code main PROC MOV EAX,0 MOV ECX,4 exit main ENDP base L1: MOV BX, ARRAY2[EAX] ADD EAX,2 call dumpregs LOOP L1 Displacement MOV BX, [ARRAY2 + EAX] ADD EAX,2

Base-Register Addressing.data count dword 10h array1 byte 10h, 11h, 12h, 13h array2 word 123h, 234h, 345h, 456h array3 dword 123456h, 23456789h.code main PROC mov esi, offset array1 mov ebx,10h mov ax, word ptr [ebx+esi] add ebx, esi mov cx, word ptr [ebx] call dumpregs exit main ENDP ebx + esi 00000010 + 00404004 00404014 Base-index addressing cx = 6789h ax = 6789h Base addressing array3 Memory address 00404014 00404015 00404016 00404017 Memory content 89 67 45 23

Pentium Addressing Modes Virtual or effective address is offset into segment Starting address plus offset gives linear address This goes through page translation if paging enabled 12 addressing modes available Immediate Register operand Displacement Base Base with displacement Scaled index with displacement Base with index and displacement Base scaled index with displacement Relative

Pentium Addressing Mode Calculation

Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than one instruction format in an instruction set

Instruction Formats Four common instruction formats: (a) Zero-address instruction. (b) One-address instruction (c) Two-address instruction. (d) Three-address instruction.

Instruction Length Affected by and affects: Memory size Memory organization Bus structure CPU complexity CPU speed Trade off between powerful instruction repertoire and saving space Other issues: Instruction length equal or multiple to memory transfer length (bus system)?