Chapter 2 1
Instruction Set Architecture A very important abstraction interface between hardware and low-level software standardizes instructions, machine language bit patterns, etc. advantage: different implementations of the same architecture disadvantage: sometimes prevents using new innovations Modern instruction set architectures: IA-32, PowerPC, MIPS, SPARC, ARM, and others 2
Instructions: Language of the Machine We ll be working with the MIPS instruction set architecture Simple instruction set and format Similar to other architectures developed since the 1980's Almost 100 million MIPS processors manufactured in 2002 used by NEC, Nintendo, Cisco, Silicon Graphics, Sony, 1400 1300 1200 1100 1000 900 800 Other SPARC Hitachi SH PowerPC Motorola 68K MIPS IA-32 ARM 700 600 500 400 300 200 100 0 1998 1999 2000 2001 2002 3
MIPS arithmetic All instructions have 3 operands Operand order is fixed (destination first) Example: C code: MIPS code : a = b + c add a, b, c (we ll talk about registers in a bit) The natural number of operands for an operation like addition is three requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple 4
MIPS arithmetic Design Principle: simplicity favors regularity. Of course this complicates some things... C code: a = b + c + d; MIPS code: add a, b, c add a, a, d Operands must be registers, only 32 registers provided Each register contains 32 bits Design Principle: smaller is faster. Why? 5
Registers vs. Memory Arithmetic instructions operands must be registers, only 32 registers provided Compiler associates variables with registers What about programs with lots of variables Control Input Memory Datapath Output Processor I/O 6
Memory Organization Viewed as a large, single-dimension array, with an address. A memory address is an index into the array "Byte addressing" means that the index points to a byte of memory. 0 1 2 3 4 5 6... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 7
Memory Organization Bytes are nice, but most data items use larger "words" For MIPS, a word is 32 bits or 4 bytes. 0 4 8 12 32 bits of data 32 bits of data 32 bits of data 32 bits of data Registers hold 32 bits of data... 2 32 bytes with byte addresses from 0 to 2 32-1 2 30 words with byte addresses 0, 4, 8,... 2 32-4 Words are aligned i.e., what are the least 2 significant bits of a word address? 8
Instructions Load and store instructions Example: C code: A[12] = h + A[8]; # $s3 stores base address of A and $s2 stores h MIPS code: lw $t0, 32($s3) add $t0, $s2, $t0 sw $t0, 48($s3) Can refer to registers by name (e.g., $s2, $t2) instead of number Store word has destination last Remember arithmetic operands are registers, not memory! Can t write: add 48($s3), $s2, 32($s3) 9
Instructions Example: C code: g = h + A[i]; # $s3 stores base address of A and # g,h and i in $s1,$s2 and $s4 Add $t1,$s4,$s4 Add $t1,$t1,$t1 Add $t1,$t1,$s3 Lw $t0,0($t1) Add $s1,$s2,$t0 t1 = 2*i t1 = 4*i t1 = 4*i + s3 t0 = A[i] g = h + A[i] 10
So far we ve learned: MIPS loading words but addressing bytes arithmetic on registers only Instruction Meaning add $s1, $s2, $s3 $s1 = $s2 + $s3 sub $s1, $s2, $s3 $s1 = $s2 $s3 lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100] = $s1 11
Policy of Use Conventions Name Register number Usage $zero 0 the constant value 0 $v0-$v1 2-3 values for results and expression evaluation $a0-$a3 4-7 arguments $t0-$t7 8-15 temporaries $s0-$s7 16-23 saved $t8-$t9 24-25 more temporaries $gp 28 global pointer $sp 29 stack pointer $fp 30 frame pointer $ra 31 return address Register 1 ($at) reserved for assembler, 26-27 for operating system 12
MIPS Format Instructions, like registers and words are also 32 bits long add $t1, $s1, $s2 Registers: $t1=9, $s1=17, $s2=18 Fetch,Decode,Execute, PC!! Instruction Format: 000000 10001 10010 01001 00000 100000 op rs rt rd shamt funct 13
Our Goal add $t1, $s1, $s2 ($t1=9, $s1=17, $s2=18) 000000 10001 10010 01001 00000 100000 op rs rt rd shamt funct 14
MIPS Format How about lw $s1, 100($s2) 15
Machine Language Consider the load-word and store-word instructions, What would the regularity principle have us do? New principle: Good design demands a compromise Introduce a new type of instruction format I-type for data transfer instructions other format was R-type for register Example: lw $t0, 32($s2) 35 18 9 32 op rs rt 16 bit number Where's the compromise? 16
Summary Name Register number Usage $zero 0 the constant value 0 $v0-$v1 2-3 values for results and expression evaluation $a0-$a3 4-7 arguments $t0-$t7 8-15 temporaries $s0-$s7 16-23 saved $t8-$t9 24-25 more temporaries $gp 28 global pointer $sp 29 stack pointer $fp 30 frame pointer $ra 31 return address instruction format op rs rt rd shamt funct address add R 0 reg reg reg 0 32 na sub R 0 reg reg reg 0 34 na lw I 35 reg reg na na na address sw I 43 reg reg na na na address A[300]=h+A[300] # $t1 = base address of A, $s2 stores h # use $t0 for temporary register Lw $t0,1200($t1) Add $t0, $s2, $t0 Sw $t0, 1200($t1) Op rs,rt,address 35,9,8,1200 Op,rs,rt,rd,shamt,funct 0,18,8,8,0,32 Op,rs,rt,address 43,9,8,1200 17
Summary of Instructions We Have Seen So Far 18
Shift and Logical Operations 19
Summary of New Instructions 20
Control Instructions 21
Using If-Else $s0 = f $s1 = g $s2 = h $s3 = i $s4 = j $s5 = k Where is 0,1,2,3 stored? 22
Addresses in Branches Instructions: bne $t4,$t5,label beq $t4,$t5,label Formats: Next instruction is at Label if $t4 $t5 Next instruction is at Label if $t4=$t5 I op rs rt 16 bit address What if the Label is too far away (16 bit address is not enough) 23
Addresses in Branches and Jumps Instructions: bne $t4,$t5,label if $t4!= $t5 beq $t4,$t5,label if $t4 = $t5 j Label Next instruction is at Label Formats: I J op rs rt 16 bit address op 26 bit address 24
Control Flow We have: beq, bne, what about Branch-if-less-than? If (a<b) # a in $s0, b in $s1 slt $t0, $s0, $s1 # t0 gets 1 if a<b bne $t0, $zero, Less # go to Less if $t0 is not 0 Combination of slt and bne implements branch on less than. 25
While Loop While (save[i] == k) # i, j and k correspond to registers i = i+j; # $s3, $s4 and $s5 # array base address at $s6 Loop: add $t1, $s3, $s3 add $t1, $t1, $t1 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit add $s3, $s3, $s4 j loop Exit: 26
What does this code do? 27
Overview of MIPS simple instructions all 32 bits wide very structured, no unnecessary baggage only three instruction formats R I J op rs rt rd shamt funct op rs rt 16 bit address op 26 bit address 28
What is the use of $ra? Policy of Use Conventions Name Register number Usage $zero 0 the constant value 0 $v0-$v1 2-3 values for results and expression evaluation $a0-$a3 4-7 arguments $t0-$t7 8-15 temporaries $s0-$s7 16-23 saved $t8-$t9 24-25 more temporaries $gp 28 global pointer $sp 29 stack pointer $fp 30 frame pointer $ra 31 return address Register 1 ($at) reserved for assembler, 26-27 for operating system 29
Example swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; } Let $s5 store k value and $s4 store base address of the v[] Byte addressing problem??? swap: muli $s2, $s5, 4 add $s2, $s4, $s2 : swap: muli $s2, $s5, 4 add $s2, $s4, $s2 lw $t15, 0($s2) lw $t16, 4($s2) sw $t16, 0($s2) sw $t15, 4($s2) jr $31 30
Summary 31
Other Issues More reading: support for procedures linkers, loaders, memory layout stacks, frames, recursion manipulating strings and pointers interrupts and exceptions system calls and conventions Some of these we'll talk more about later We have already talked about compiler optimizations 32
Nested Procedures, function_main(){ function_a(var_x); /* passes argument using $a0 */ : /* function is called with jal instruction */ return; } function_a(int size){ function_b(var_y); /* passes argument using $a0 */ : /* function is called with jal instruction */ return; } function_b(int count){ : return; } Resource Conflicts??? 33
Function Call and Stack Pointer jr 34
Recursive Procedures Invoke Clones!!! int fact (int n) { if (n < 1 ) return ( 1 ); else return ( n * fact ( n-1 ) ); } Registers $a0 and $ra n corresponds to $a0 Program starts with the label of the procedure fact How many registers do we need to save on the stack? 35
Factorial Code fact: sub $sp, $sp, 8 sw $ra, 4($sp) sw $a0, 0($sp) slt $t0, $a0, 1 # is n<1? beq $t0, $zero, L1 # if not go to L1 add $v0, $zero, 1 add $sp, $sp, 8 jr $ra L1: sub $a0, $a0, 1 jal fact # call fact(n-1) lw $a0, 0($sp) # restore n lw $ra, 4($sp) # restore address add $sp, $sp,8 # pop 2 items mult $v0,$a0,$v0 # return n*fact(n-1) jr $ra # return to caller Stack for n=3 Ra 3 Ra 2 Ra 1 36
What is the Use of Frame Pointer? Variables local to procedure do not fit in registers!!! 37
Elaboration Name Register number Usage $zero 0 the constant value 0 $v0-$v1 2-3 values for results and expression evaluation $a0-$a3 4-7 arguments $t0-$t7 8-15 temporaries $s0-$s7 16-23 saved $t8-$t9 24-25 more temporaries $gp 28 global pointer $sp 29 stack pointer $fp 30 frame pointer $ra 31 return address What if there are more than 4 parameters for a function call? Addressable via frame pointer References to variables in the stack have the same offset Use of jal? Saves address of the next instruction that follows jal into $ra Procedure return can now be simply jr $ra 38
Data and Stack Segments Given the following initial layout of memory Each box represents a word (two bytes), $S0 initially contains the value 5, what is the result of each of the following instructions? (Assume that each instruction begins with memory laid out as shown) How is memory affected by: sw $S0, 4($sp) What is the value in $S0 after lw $S0,0($sp) What is contained in $S0 after lw $S1,-4($sp) lw $S0,6($s1) What is contained in $S0 after lw $S1,-2($sp) lw $S0,0($S1) Write the sequence of instructions to store contents of $S0 into address stored in $sp+6 39
Using If-Else $s1-$s5: g-k 40
Jump Address Table 41
MIPS Example, Use of Jump Address Table Name Register number Usage $zero 0 the constant value 0 $v0-$v1 2-3 values for results and expression evaluation $a0-$a3 4-7 arguments $t0-$t7 8-15 temporaries $s0-$s7 16-23 saved $t8-$t9 24-25 more temporaries $gp 28 global pointer $sp 29 stack pointer $fp 30 frame pointer $ra 31 return address 42
Using Jump Table 43
Same Code, this time using If-Else 44
CPI 45
Arrays vs. Pointers clear1( int array[ ], int size) { int i; for (i=0; i<size; i++) array[i]=0; } clear2(int* array, int size) { int* p; for( p=&array[0]; p<&array[size]; p++) *p=0; } 46
Clear1 clear1( int array[ ], int size) { int i; for (i=0; i<size; i++) array[i]=0; } add $t0,$zero,$zero # i=0, register $t0=0 loop1: add $t1,$t0,$t0 # $t1=i*2 add $t1,$t1,$t1 # $t1=i*4 add $t2,$a0,$t1 # $t2=address of array[i] sw $zero, 0($t2) # array[i]=0 addi $t0,$t0,1 # i=i+1 slt $t3,$t0,$a1 # $t3=(i<size) bne $t3,$zero,loop1 # if (i<size) go to loop1 array in $a0 size in $a1 i in $t0 47
Clear2 for (i=0; i<size; i++) array[i]=0; clear2(int* array, int size) { int* p; for( p=&array[0]; p<&array[size]; p++) *p=0; } Array and size to registers $a0 and $a1 add $t0,$a0,$zero # p = address of array[0] loop2: sw $zero,0($t0) # memory[p]=0 addi $t0,$t0,4 # p = p+4 add $t1,$a1,$a1 # $t1 = size*2 add $t1,$t1,$t1 # $t1 = size*4 add $t2,$a0,$t1 # $t2 = address of array[size] slt $t3,$t0,$t2 # $t3=(p<&array[size]) bne $t3,$zero,loop2 # if (p<&array[size]) go to loop2 48
Clear2, Version 2 clear2(int* array, int size) { int* p; for( p=&array[0]; p<&array[size]; p++) *p=0; } Array and size to registers $a0 and $a1 add $t0,$a0,$zero # p = address of array[0] add $t1,$a1,$a1 # $t1 = size*2 add $t1,$t1,$t1 # $t1 = size*4 add $t2,$a0,$t1 # $t2 = address of array[size] loop2: sw $zero,0($t0) # memory[p]=0 addi $t0,$t0,$4 # p = p+4 slt $t3,$t0,$t2 # $t3=(p<&array[size]) bne $t3,zero,loop2 # if (p<&array[size]) go to loop2 49
Array vs. Pointer add $t0,$zero,$zero # i=0, register $t0=0 loop1: add $t1,$t0,$t0 # $t1=i*2 add $t1,$t1,$t1 # $t1=i*4 add $t2,$a0,$t1 # $t2=address of array[i] sw $zero, 0($t2) # array[i]=0 addi $t0,$t0,1 # i=i+1 slt $t3,$t0,$a1 # $t3=(i<size) bne $t3,$zero,loop1 # if (i<size) go to loop1 add $t0,$a0,$zero # p = address of array[0] add $t1,$a1,$a1 # $t1 = size*2 add $t1,$t1,$t1 # $t1 = size*4 add $t2,$a0,$t1 # $t2 = address of array[size] loop2: sw $zero,0($t0) # memory[p]=0 addi $t0,$t0,$4 # p = p+4 slt $t3,$t0,$t2 # $t3=(p<&array[size]) bne $t3,zero,loop2 # if (p<&array[size]) go to loop2 50
Big Picture 51
Compiler 52
Addressing Modes 53
Our Goal add $t1, $s1, $s2 ($t1=9, $s1=17, $s2=18) 000000 10001 10010 01001 00000 100000 op rs rt rd shamt funct 54
Assembly Language vs. Machine Language Assembly provides convenient symbolic representation much easier than writing down numbers e.g., destination first Machine language is the underlying reality e.g., destination is no longer first Assembly can provide 'pseudoinstructions' e.g., move $t0, $t1 exists only in Assembly would be implemented using add $t0,$t1,$zero When considering performance you should count real instructions 55
Summary Instruction complexity is only one variable lower instruction count vs. higher CPI / lower clock rate Design Principles: simplicity favors regularity smaller is faster good design demands compromise make the common case fast Instruction set architecture a very important abstraction indeed! 56