Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1
Short review of the previous lecture Performance = 1/(Execution time) = Clock rate / (Average CPI *Inst count) Execution time (benchmark) = total execution time (might be weighted) = sum of CPI_particular_instruction * specific instructions count/clock rate = effective average CPI * Total Instruction Count
A Few Words About Where We Are Headed Performance = 1 / Execution time simplified to 1 / CPU execution time CPU execution time = Instructions CPI / (Clock rate) Performance = Clock rate / ( Instructions CPI ) Try to achieve CPI = 1 with clock that is as high as that for CPI > 1 designs; is CPI < 1 feasible? (Chap 15-16) Design memory & I/O structures to support ultrahigh-speed CPUs Design hardware for CPI = 1; seek Define an instruction set; improvements with make it simple enough CPI > 1 (Chap 13-14) to require a small number of cycles and allow high clock rate, but not so simple that we need many Design ALU for instructions, even for very arithmetic & logic simple tasks (Chap 5-8) ops (Chap 9-12) Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 3
Simple Datapath (from BP) Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 4
High-level language statement: a = b + c Assembly language instruction: add $t8, $s2, $s1 Machine language instruction: 11 11 11 1 ALU-type Addition instruction 18 17 24 Unused opcode Instruction cache file Data cache (not used) file P C $17 $18 ALU $24 Instruction fetch readout Operation Data read/store writeback Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 5
Simple Datapath (from P&H)
Strategies for Speeding Up Instruction Execution Performance = 1 / Execution time simplified to 1 / CPU execution time CPU execution time = Instructions CPI / (Clock rate) Performance = Clock rate / ( Instructions CPI ) Assembly line analogy Single-cycle (CPI = 1) Faster Parallel processing or pipelining Items that take longest to inspect dictate the speed of the assembly line Faster Multicycle (CPI > 1) Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 7
Execution Time vs. MIPS vs FLOPS (sustained and peak) Load to register (2) Load to register (2) Arithmetic operation (1) Arithmetic operation (1) Floating Point Add (1) Arithmetic operation (1) Arithmetic operation (1) Store (2) Control Instruction (1) Load to register (2) Load to register (2) Arithmetic operation (1) Floating Point Add (1) Arithmetic operation (1) Store (2) Control Instruction (1) Multicycle implementation
II Instruction Set Architecture Introduce machine words and its vocabulary, learning: A simple, yet realistic and useful instruction ti set Machine language programs; how they are executed RISC vs CISC instruction-set design philosophyp Topics in This Part Chapter 5 Chapter 6 Chapter 7 Chapter 8 Instructions and Addressing Procedures and Data Assembly Language Programs Instruction Set Variations Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 9
Simple Datapath (from BP) What blocks are necessary? What (steps) instructions are necessary? Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1
8.6 Where to Draw the Line The ultimate reduced instruction set computer (URISC): How many instructions are absolutely needed for useful computation? Only one! subtract source1 from source2, replace source2 with the result, and jump to target address if result is negative Assembly language form: label: urisc dest,src1,target Pseudoinstructions can be synthesized using the single instruction: stop:.word This is the move start: t urisc dest,dest,+1 d t # dest = pseudoinstruction urisc temp,temp,+1 # temp = Corrected urisc temp,src,+1 # temp = -(src) version urisc dest,temp,+1 # dest = -(temp); i.e. (src)... # rest of program Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 11
URISC Hardware URISC instruction: Word 1 Word 2 Word 3 Source 1 Source 2 / Dest Jump target Comp R R 1 C in Adder N in Z in PC in P C M D R MDR in M A R MAR in Read Write Memory unit R in N Z 1 Mux PC out Figure 8.5 Instruction format and hardware structure for URISC. Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 12
What blocks to use? How much real estate do we have? URISC instruction: Word 1 Word 2 Word 3 Source 1 Source 2 / Dest Jump target Intel 44 ~ 2K transistors Comp C PC in MDR MAR in in in Write Legacy issues Read 1 (compatibility M M R R Adder P D A C R R R in N in N Z Z in 1 Mux PC out Memory unit with original architectures) Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 13
What blocks to use? Memory is slow Plenty of data reuse URISC instruction: Comp C in 1 R R Adder R in N in Word 1 Word 2 Word 3 Source 1 Source 2 / Dest Jump target N Z Z in PC in P C 1 Mux M D R MDR in M A R PC out MAR in Read Write Memory unit Various types of local memory/ memory hierarchy Number of registers? Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 14
$ $1 $2 $3 $4 $5 $6 $7 $8 $9 $1 $11 $12 $13 $14 $15 $16 $17 $18 $19 $2 $21 $22 $23 $24 $25 $26 $27 $28 $29 $3 $31 $zero $at Reserved for assembler use $v Procedure results $v1 $a $a1 Procedure Saved $a2 arguments $a3 $t $t1 $t2 $t3 Temporary $t4 values $t5 $t6 $t7 $s $s1 $s2 Saved $s3 across Operands $s4 procedure $s5 calls $s6 $s7 $t8 More $t9 temporaries $k $k1 Reserved for OS (kernel) $gp Global pointer $sp Stack pointer Saved $fp Frame pointer $ra Return address A 4-byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address) 3 2 1 Byte numbering: 3 2 1 When loading a byte into a register, it goes in the low end Doublew ord Word Byte A doubleword sits in consecutive registers or memory locations according to the big-endian order (most significant word comes first) Conventions i t Figure 5.2 s and data sizes in MiniMIPS. Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 15
What blocks to use? What should ALU be capable to do? URISC instruction: Word 1 Word 2 Word 3 Source 1 Source 2 / Dest Jump target Comp R 1 R C in Adder N in Z in PC in P C M D R MDR in M A R MAR in Read Write Memory unit R in N Z 1 Mux PC out Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 16
5 Instructions and Addressing First of two chapters on the instruction set of MiniMIPS: Required for hardware concepts in later chapters Not aiming for proficiency in assembler programming Topics in This Chapter 5.1 Abstract View of Hardware 5.2 Instruction ti Formats 5.3 Simple Arithmetic / Logic Instructions 5.4 Load and Store Instructions 5.5 Jump and Branch Instructions 5.6 Addressing Modes Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 17
5.2 Instruction Formats High-level language statement: a = b + c Assembly language g instruction: add $t8, $s2, $s1 Machine language instruction: 11 11 11 1 ALU-type Addition instruction 18 17 24 Unused opcode Instruction cache file Data cache (not used) file P C $17 $18 ALU $24 Instruction fetch readout Operation Data read/store writeback Jan. 211 Slide 18
MiniMIPS Instruction Formats op rs rt 31 25 2 15 1 5 R 6 bits 5 bits 5 bits 5 bits I J Opcode Source register 1 Source register 2 rd Destination register sh 5 bits Shift amount op rs rt operand / offset fn 6 bits Opcode extension 31 25 2 15 6 bits 5 bits 5 bits 16 bits Opcode op Source or base Destination or data 1 jump target address Imm ediate operand or address offset 31 25 6 bits 26 bits 1 1 1 1 1 Opcode Memory word address (byte address divided by 4) Figure 5.4 MiniMIPS instructions come in only three formats: register (R), immediate (I), and jump (J). Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 19
5.3 Simple Arithmetic/Logic Instructions Add and subtract already discussed; logical instructions are similar add $t,$s,$s1 # set $t to ($s)+($s1) sub $t,$s,$s1 $ $ # set $t to ($s)-($s1) and $t,$s,$s1 # set $t to ($s) ($s1) or $t,$s,$s1 # set $t to ($s) ($s1) xor $t,$s,$s1 $ $ # set $t to ($s) ($s1) nor $t,$s,$s1 # set $t to (($s) ($s1)) R op rs rt rd sh 31 25 2 15 1 5 1 1 1 1 1 x ALU instruction Source Source Destination register 1 register 2 register fn Unused add = 32 sub = 34 Figure 5.5 The arithmetic instructions add and sub have a format that is common to all two-operand ALU instructions. For these, the fn field specifies the arithmetic/logic operation to be performed. Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 2
Arithmetic/Logic with One Immediate Operand An operand in the range [ 32 768, 32 767], or [x, xffff], can be specified in the immediate field. addi $t,$s,61 # set $t to ($s)+61 andi $t,$s,61 # set $t to ($s) 61 ori $t,$s,61 # set $t to ($s) 61 xori $t,$s,xff # set $t to ($s) xff For arithmetic instructions, the immediate operand is sign-extended op rs rt operand / offset 31 25 2 15 I 1 1 1 1 1 1 1 1 1 Errors 1 addi = 8 Source Destination Immediate operand Figure 5.6 Instructions such as addi allow us to perform an arithmetic or logic operation for which one operand is a small constant. Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 21
5.4 Load and Store Instructions op rs rt operand / offset 31 25 2 15 I 1 x 1 1 1 1 1 1 1 1 lw = 35 Base Data Offset relative to base sw = 43 register register Memory A[] A[1] A[2]... A[i] lw $t,4($s3) lw $t,a($s3) Offset = 4i Address in base register Element i of array A Note on base and offset: The memory address is the sum of (rs) and an immediate value. Calling one of these the base and the other the offset is quite arbitrary. It would make perfect sense to interpret the address A($s3) as having the base A and the offset ($s3). However, a 16-bit base confines us to a small portion of memory space. Figure 5.7 MiniMIPS lw and sw instructions and their memory addressing convention that allows for simple access to array elements via a base address and an offset (offset = 4i leads us to the i th word). Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 22