Assembly Programming - PDF Free Download

Designing Computer Systems Assembly Programming 08:34:48 PM 23 August 2016 AP-1 Scott & Linda Wills

Designing Computer Systems Assembly Programming In the early days of computers, assembly programming was the way to implement an application in software. It was complicated, but challenging. And it yielded very efficient programs because it ran directly on the machine hardware. This visibility of the execution platform had many consequences: primitive, inflexible programming metaphors, complex debugging, limited portability, expensive maintenance the list goes on. Fortunately, high-level languages (HLL) were introduced early in the process to support a more usable, albeit less efficient, programming environment. Today we program in high-level languages in virtually all software building exercises. So why should we care about assembly language? Naive programmers write slow, buggy programs : Understanding the hardware of computers is nice; but it rarely changes how one uses computers. On the other hand, understanding the (assembly) programming language of computers can profoundly improve the way one programs computers. Even a fairly competent programmer will benefit from a knowledge of how high-level program statements translate into machine instructions. Code becomes cleaner, less buggy, and more efficient. So let's get started A Programmer's View of the Machine: Let's start with a basic view of the assembly-visible machine to execute stored program. When a program is written, a compiler and assembler transforms the human-readable representation of source code into a series of binary instructions that are readable by the machine. Each processor family has it's own unique machine language defined in its instruction set architecture (ISA). The MIPS ISA is representative of today's machine execution environment. This programming model for a MIPS processor is shown here. Controller Code Memory Decoder Datapath Data Memory instruction pointer addr memory array data operands functions register file functional units data memory array Controller: Program execution begins in the controller. The instruction pointer (IP) contains the address of the next assembly instruction to be executed in code memory. The controller also contains hardware to update the IP with the next 08:34:48 PM 23 August 2016 AP-2 Scott & Linda Wills

instruction to be executed. Normally this is a simple incrementing of the IP for sequential instructions. The primary objective of sequential instructions is to perform an operation on operands in the datapath or memory. A small group of nonsequential instructions perform specific manipulation of the IP using information in the instruction. Non-sequential instruction classes include branches and jumps. These instructions sometimes interact with the datapath. Code Memory: Code memory is a large block of storage that contains the assembly instructions. A memory block is just a indexable array of words. Here each word is an instruction to be executed. A program's instructions are loaded into code memory prior to the program's execution. The value of the IP provides the address of the current instruction in code memory. This instruction is accessed from memory and passed to instruction decoding. Decoder: Even though the ISA is designed for a family of processors, an instruction must be decoded to extract information needed for execution. The instruction word accessed from code memory is passed to the instruction word register in the decoder. An instruction format is broken into two parts: operations and operands. The operation part, called an opcode, has an encoded value that defines exactly what operation is to be executed. Typically this configures the datapath function units. But sometimes it preparers for a data memory access or a non-sequential operation in the controller. The instruction also contains operand fields that specify what data should be processed by the instruction, and where the result should be stored. Each ISA has its own instruction formats to represent operation and operand fields. The MIPS ISA has three instruction formats: R-type opcode destination register source register 1 source register 2 I-type opcode destination register source register 1 immediate value J-type opcode jump target address The opcode specifies the operation to be performed. It is a short field (six bits) supporting 64 instruction types. The operand fields vary among the instruction types. R-type instructions specify two source registers and a destination register. Since there are 32 registers, each register field requires five bits. Here's an R-type instruction for addition: add $3, $1, $2 # R3 R1 + R2 I-type instructions are similar to R-type, but the second source register is replaced by a value contained in the instruction word, called an immediate value. Limitations of the instruction word restrict this value to 16 bits. Here an example: addi $3, $1, 569 # R3 R1 + 569 08:34:48 PM 23 August 2016 AP-3 Scott & Linda Wills

J-type instructions support non-sequential operations that don't require registers in the datapath. Rather, the instruction field contains the target address in code memory where execution is transferred. Here's an example: j Foo # goto Foo Because computer implementations change with technology, the decoder translates the invariant machine instruction to the appropriate commands for the current processor architecture. Datapath: The datapath is where most instructions are executed. It includes many functional units that have hardware to perform the specified operation. It also has an array of registers, a register file, that provides most of the operands to the functional units as well as storing the results. Registers provide the necessary high speed access in the data path. However, the register file is limited in size. So a data memory composed of dense storage cells is connected to the datapath. Data Memory: Data memory provides storage for objects in the program. While operands cannot directly be supplied from data memory, it has the capacity to hold large data sets required for programs. Words are copied from data memory into registers (using load instructions) prior to their processing in the datapath. Register values are also copied back to memory using store instructions. Instruction Set: The heart of an instruction set architecture is an instruction set! A programmer's view of a processor begins with the available instructions. Let's explore the different classes of MIPS instructions. Arithmetic Instructions: A four banger calculator is a simple, yet useful computer that allows us to evaluate expressions from diverse applications: everything from complex engineering problems to our car's miles per gallon at the gas station. But in these computations, the stored program is in our head. We transform an equation into a series of addition, subtraction, multiplication, and division operations. As assembly programmers, we can transcribe the steps into corresponding operations in the instruction set. The syntax of an arithmetic assembly instruction is simple: operation destination, source 1, source 2 Register numbers are denoted using the dollar sign '$'. Let's try an example assuming A, B, C, and D are stored in registers R1, R2, R3, and R4, respectively. D = A + B - C add $4, $1, $2 # D = A + B sub $4, $4, $3 # D = D - C 08:34:48 PM 23 August 2016 AP-4 Scott & Linda Wills

The add instruction adds A and B, placing the intermediate result in D. Then the sub instruction subtracts C, placing the final result back in D. Sometimes even simple expressions can be rather opaque. Comments can help clarify the code by using variable names, etc. Assembly comments begin with the pound sign '#' and continue to the end of the line. Here's the example with comments. D = A + B - C add $4, $1, $2 sub $4, $4, $3 # D = A + B # D = D - C The second source operand can be either a register (R-type instructions) or an immediate value (I-type). An immediate value is a constant value stored in the instruction word and is limited to 16 bits. For arithmetic instructions, this limits the range of immediate values to approximately ±32 thousand. I-type instructions are differentiated by an appended 'i' to the operation name. Here's an example. D = A + 6 + B - 5 addi $4, $1, 6 # D = A + 6 add $4, $4, $2 # D = D + B addi $4, $4, -5 # D = D - 5 Note that add immediate is used for both positive and negative values. There is no need for subtract immediate. Unsigned Instructions: Most of the computations we perform are on signed variables (i.e., the values can be positive or negative). If a computed value exceeds the value range (for 32 bit values, that ±2 billion), an error is generated. If the application does not expect negative values, the range of positive values can be doubled (e.g., 0 to 4 billion). This doesn't affect how the operation is performed, rather is changes how errors are assigned. For unsigned operations, the instruction is appended with a 'u' (e.g., addu, subu, addiu). The HiLo Register: Multiplication can have a much larger range of results than addition. For this reason, the mult instruction stores its result in a special register name HiLo that is twice the length of a normal register. When two 32 bit registers are multiplied, the results are placed in the 64-bit HiLo register (Hi is the upper 32 bits; Lo is the lower 32 bits). This preserves the full result; but the value must be moved (in parts) to normal registers before it can be used in another operation. At twice the word size, this can become complicated. Fortunately, for most applications, we assume the result can be captured in Lo following the multiplication. The move from low (mflo) transfers the value in Lo to a numbered register. So multiplication requires two instructions: D = A * B mult $1, $2 mflo $4 # HiLo = A * B # D = Lo Division also uses the HiLo register; but for a different reason. Since we are performing integer operations, division is approximate. In the MIPS ISA, the 08:34:48 PM 23 August 2016 AP-5 Scott & Linda Wills

divide instruction places the quotient in Lo, and places the remainder in Hi. This supports both integer division and modulo operations. C = A / B D = A % B div $1, $2 # HiLo = A * B mflo $3 # C = A/B mfhi $4 # D = A mod B Note the move from Hi (mfhi) instruction moves the modulo result to the numbered register. Both multiply and divide have unsigned forms (multu, divu) as well. Logical Instructions: Not all instructions perform arithmetic operations. Sometimes operands are treated as a bunch of bits and are processed using logical functions. Popular logical operations include invert (NOT), mask (AND), combine (OR), inverted combine (NOR), and selective invert (XOR). X NOT AND OR NOR XOR X 0 1 1 0 X Y X Y 0 0 0 1 0 0 0 1 0 1 1 1 X Y X+Y 0 0 0 1 0 1 0 1 1 1 1 1 X Y X+Y 0 0 1 1 0 0 0 1 0 1 1 0 X Y X Y 0 0 0 1 0 1 0 1 1 1 1 0 These functions operate on pairs of bits. Since registers contain many bits, these functions are applied on a pair of registers in a bit-wise fashion. That means the ith bit of each register is processed to create the ith bit of the result. If registers were eight bits wide, bit-wise operations would look like this: NOT AND OR NOR XOR 0 0 1 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 1 0 Logical instructions look similar to arithmetic instructions. D = A B + C 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 0 1 1 0 and $4, $1, $2 # D = A B or $4, $4, $3 # D = D + C Sign Extension: Immediate forms are similar as well, but for one difference. Since the sixteen bit immediate value is no longer treated as a negative number, it handled differently when it is mapped into the 32 bit word. For negative numbers in two's complement, a 16 bit value is placed in the lowest sixteen bits of the 32 bit word. Then its most significant bit (bit 15) is copied into all of the upper sixteen bits. This is called sign extension. Since immediate values in logical operations are not treated as negative numbers, no sign extension is performed. 08:34:48 PM 23 August 2016 AP-6 Scott & Linda Wills

arithmetic 0011011011001001 00000000000000000011011011001001 logical 0011011011001001 00000000000000000011011011001001 1010011010110010 11111111111111111010011010110010 1010011010110010 00000000000000001010011010110010 As you can see, working with binary can be tedious when words sizes are large. Hexadecimal is more convenient. But it must be differentiated from decimal values. The standard method to express a hexadecimal value is by perpending the value with '0x'. For example, the sign extension examples above can be more compactly shown in hexadecimal. arithmetic 0x36C9 0x000036C9 logical 0x36C9 0x000036C9 0xA6B2 0xFFFFA6B2 0xA6B2 0x0000A6B2 Sometimes is is convenient to use hexadecimal values as immediate fields in logical and arithmetic instructions. For example, consider masking the least significant byte of a 32 bit word in register one. Using a decimal immediate constant is not as clear as the hexadecimal constant. andi $4, $1, 255 # mask LS byte andi $4, $1, 0xFF # mask LS byte 08:34:48 PM 23 August 2016 AP-7 Scott & Linda Wills

MIPS Instruction Set (core) instruction example meaning arithmetic add add $1,$2,$3 $1 = $2 + $3 subtract sub $1,$2,$3 $1 = $2 - $3 add immediate addi $1,$2,100 $1 = $2 + 100 add unsigned addu $1,$2,$3 $1 = $2 + $3 subtract unsigned subu $1,$2,$3 $1 = $2 - $3 add immediate unsigned addiu $1,$2,100 $1 = $2 + 100 set if less than slt $1, $2, $3 if ($2 < $3), $1 = 1 else $1 = 0 set if less than immediate slti $1, $2, 100 if ($2 < 100), $1 = 1 else $1 = 0 set if less than unsigned sltu $1, $2, $3 if ($2 < $3), $1 = 1 else $1 = 0 set if < immediate unsigned sltiu $1, $2, 100 if ($2 < 100), $1 = 1 else $1 = 0 multiply mult $2,$3 Hi, Lo = $2 * $3, 64-bit signed product multiply unsigned multu $2,$3 Hi, Lo = $2 * $3, 64-bit unsigned product divide div $2,$3 Lo = $2 / $3, Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 / $3, Hi = $2 mod $3, unsigned transfer move from Hi mfhi $1 $1 = Hi move from Lo mflo $1 $1 = Lo load upper immediate lui $1,100 $1 = 100 x 2 16 logic and and $1,$2,$3 $1 = $2 & $3 or or $1,$2,$3 $1 = $2 $3 and immediate andi $1,$2,100 $1 = $2 & 100 or immediate ori $1,$2,100 $1 = $2 100 nor nor $1,$2,$3 $1 = not($2 $3) xor xor $1, $2, $3 $1 = $2 $3 xor immediate xori $1, $2, 255 $1 = $2 255 shift shift left logical sll $1,$2,5 $1 = $2 << 5 (logical) shift left logical variable sllv $1,$2,$3 $1 = $2 << $3 (logical), variable shift amt shift right logical srl $1,$2,5 $1 = $2 >> 5 (logical) shift right logical variable srlv $1,$2,$3 $1 = $2 >> $3 (logical), variable shift amt shift right arithmetic sra $1,$2,5 $1 = $2 >> 5 (arithmetic) shift right arithmetic variable srav $1,$2,$3 $1 = $2 >> $3 (arithmetic), variable shift amt memory load word lw $1, 1000($2) $1 = memory [$2+1000] store word sw $1, 1000($2) memory [$2+1000] = $1 load byte lb $1, 1002($2) $1 = memory[$2+1002] in least sig. byte load byte unsigned lbu $1, 1002($2) $1 = memory[$2+1002] in least sig. byte store byte sb $1, 1002($2) memory[$2+1002] = $1 (byte modified only) branch branch if equal beq $1,$2,100 if ($1 = $2), PC = PC + 4 + (100*4) branch if not equal bne $1,$2,100 if ($1 $2), PC = PC + 4 + (100*4) jump jump j 10000 PC = 10000*4 jump register jr $31 PC = $31 jump and link jal 10000 $31 = PC + 4; PC = 10000*4 08:34:48 PM 23 August 2016 AP-8 Scott & Linda Wills