Lecture 7: Instruction Set Architectures - IV

Lecture 7: Instruction Set Architectures - IV Last Time Register organization Memory issues (endian-ness, alignment, etc.) Today Exceptions General principles of ISA design Role of compiler Computer arithmetic Lecture 7 1

Control - Exceptions/Events Implied multi-way branch after every instruction External events (interrupts) completion of I/O operations Internal events (faults or exceptions) arithmetic overflow page fault What happens???? EPC PC of instruction that caused fault PC f(fault type) new PC from HW table lookup Return: PC FPC + 4 Inst 1 Inst 2 Page Flt Disk I/O RTC Overflow Lecture 7 2

Operations Data TypesAdd Modes Principles of Instruction Set Design Keep it simple (KISS) Frequency complexity increases logic area increases pipe stages increases development time evolution tends to make kludges 60% Orthogonality (modularity) 50% 40% simple rules, few exceptions 30% all ops on all registers make the common case fast some instructions (cases) are more important than others 20% 10% Regs Formats 0% INT LOAD STORE JMP FLOAT Lecture 7 3

Principles of Instruction Set Design (part 2) Generality not all problems need the same features/instructions principle of least surprise performance should be easy to predict Locality and concurrency design ISA to permit efficient implementation today 10 years from now 60% vs 50% 40% 30% 20% 10% 0% INT LOAD STORE JMP FLOAT CHAR F D R E W F D R E W F D R E W F D R E W Lecture 7 4

Good ISA design Review of ISA Principles KISS! - only implement necessities (encodings, address modes, etc.) FOG: Frequency, Orthogonality, Generality Instruction Types ALU ops, Data movement, Control Addressing modes Matched to program usage (local vars, globals, arrays) Program Control Conditional/unconditional branches and jumps Where to store conditions PC relative and absolute Lecture 7 5

Role of the Optimizing Compiler C source code HW/SW complexity tradeoffs Front End (Language Specific) IR High-Level Optimizations IR Global Optimizations Machine-IR Procedure Inlining Loop Transformations Common SubExp Elim. Code Motion Machine binary code Code Generator Instruction Scheduling Register Allocation Machine Dependent Lecture 7 6

Example: Loop Optimization LOOP: CONT: 7 LW R1, X ADD R2,R0,R0 ADD R3,R0,R0 SLT R5,R2,#MAX BEQZ R5,CONT LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 ADD R2,R2,#1 J LOOP Loop Reordering sum=0; for(i=0;i<max;i++) sum+=x[i]; LOOP: CONT: 6 LW R1, X ADD R2,R0,R0 ADD R3,R0,R0 LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 ADD R2,R2,#1 SLT R5,R2,#MAX BNEZ R5,LOOP LOOP: CONT: 5 LW R1, X ADD R2,R0,#MAX SLLI R2,R2,#2 ADD R2,R1,R2 ADD R3,R0,R0 LW R4,R1 ADD R3,R3,R4 ADD R1,R1,#4 SLT R5,R1,R2 BNEZ R5,LOOP Induction Variable Analysis Lecture 7 7

Architect Compiler Writer Simplify, Simplify, Simplify Feature difficult to use, it won t be used.less is More! Regularity Common set of formats, few special cases Primitive, not solutions CALLS vs. Fast register moves Make performance tradeoffs simple Ultimately, the ISA will *not* be perfect Lecture 7 8

Compiler Microarchitecture Instruction Scheduling Instruction Level Parallelism Resource Allocation Registers (minimize spills/restores to and from memory) Memory optimizations Cache conscious data organization Code layout Etc... Lecture 7 9

Building Blocks Arithmetic Units adders, multipliers, dividers, shifters,... Single Registers Register Files Memory Arrays Multiplexers Wires Microarchitecture involves trading off area and delay of alternative organizations Units area - tracks 2 (χ 2 ) delay - fan-out of 4 inv (τ 4 ), gate delay Typically organized into datapaths 10-20 tracks per bit slice Hard to estimate cost of control logic Lecture 7 10

What is Logic Design? Digital behavior of chip Specifies: Actual states (ISA visible and non-visible) Transitions between states Consists of: Combinational logic (ie. ALUs) Sequential logic ( memory ) Registers State machines Architecture/ISA MicroArchitecture Logic Design Circuit Design Fab Chip Lecture 7 11

Combinational Logic: The ALU B A 16 S-EXT 32 32 32 32 decode + shift sel_2 ADD,ADDI SUB,SUBI AND,ANDI,OR,ORI,XOR,XORI SLL,SRL,SRA,SLLI,SRLI,SRAI SLT,SLTI, etc. sh_func sub + 32 32 1 1 cmp 31*<0>,c c sel_1 sub LT, GT, etc. sel_3 sel_4 Lecture 7 12

Bit Slice Approach to ALU design A invert CarryIn and S-select or Mux Result B 1-bit Full Adder add Set-less-than? left as an exercise CarryOut Slide courtesy of D. Patterson Lecture 7 13

Bigger View of Bit Slicing LSB and MSB need to do a little extra A 32 B 32? a31 b31 ALU0 co cin s31 Ovflw S 32 a0 ALU0 co s0 b0 cin 4 M C/L to produce select, comp, c-in Slide courtesy of D. Patterson Lecture 7 14

Overflow Decimal Binary Decimal 0 0000 0 1 0001-1 2 0010-2 3 0011-3 4 0100-4 5 0101-5 6 0110-6 7 0111-7 Examples: 7 + 3 = 10 but... - 4-5 = - 9 but... -8 2 s Complement 0000 1111 1110 1101 1100 1011 1010 1001 1000 0 1 1 1 1 + 0 1 1 1 0 0 1 1 7 1 1 0 0 4 3 + 1 0 1 1 5 1 0 1 0 6 0 1 1 1 7 Slide courtesy of D. Patterson Lecture 7 15

Overflow Detection Overflow: the result is too large (or too small) to represent properly Example: - 8 < = 4-bit binary number <= 7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: 2 positive numbers and the sum is negative 2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: Carry into MSB xor Carry out of MSB 0 1 1 1 1 0 + 0 1 1 1 0 0 1 1 1 0 1 0 7 1 1 0 0 4 3 + 1 0 1 1 5 6 0 1 1 1 7 Slide courtesy of D. Patterson Lecture 7 16

What s the difference between. these instruction pairs add/addu addi/addiu div/divu sub/subu The unsigned versions don t check for overflow But otherwise the arithmetic algorithm is the same Lecture 7 17

Overflow Detection Logic Carry into MSB xor Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result2 ALU CarryIn3 1-bit ALU CarryOut3 Result3 X Y X XOR Y 0 0 0 0 1 1 1 0 1 1 1 0 Overflow Slide courtesy of D. Patterson Lecture 7 18

More Revised Diagram LSB and MSB need to do a little extra A 32 B 32 signed-arith and cin xor co Ovflw a31 b31 ALU0 co cin s31 S 32 a0 ALU0 co s0 b0 cin 4 M C/L to produce select, comp, c-in Slide courtesy of D. Patterson Lecture 7 19

But What about Performance? Critical Path of n-bit Rippled-carry adder is n*cp CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit ALU CarryIn1 CarryOut0 1-bit ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 CarryOut2 1-bit ALU Result0 Result1 Result2 Result3 CarryOut3 Design Trick: throw hardware at it Slide courtesy of D. Patterson Lecture 7 20

Carry Look Ahead (Design trick: peek) A0 B1 G P Cin S C1 =G0 + C0 P0 A B C-out 0 0 0 kill 0 1 C-in propagate 1 0 C-in propagate 1 1 1 generate A B G P S C2 = G1 + G0 P1 + C0 P0 P1 P = A and B G = A xor B A B G P S A B G P S C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2 G P Slide courtesy of D. Patterson C4 =... Lecture 7 21

Summary ISA principles Compiler/ISA interaction Computer arithmetic (add/shift) Next Time RISC vs. CISC Computer arithmetic Memory Simple pipeline Reading assignment P&H 4.1-4.7 Lecture 7 22