Chapter 2. Instruction Set Principles and Examples In-Cheol Park Dept. of EE, KAIST
Stack architecture( 0-address ) operands are on the top of the stack Accumulator architecture( 1-address ) one operand is implicitly the accumulator General purpose register architecture( 2- or 3- address ) either registers or memory locations register-register (load-store): virtually every machine since 1980 faster than memory easier for a compiler to use register-memory access memory as part of any instruction memory-memory keeps all operands in memory How many registers are needed to allocate variables? How are variables allocated and addressed?
How to interpret memory addresses? Length:8/16/32/64 bits A byte or half word needs to sign/zero-extend its value On some machines, the upper portion is not affected(x86) To support smaller sizes, an alignment network is required in the hardware Byte ordering within a word Little endian (Intel) Big endian (Motorola, IBM) Aligned or misaligned Aligned memory accesses run faster even in machines allowing the misaligned access
Effective address actual memory address specified by the addressing mode Addressing Modes Register Memory immediate (specified in an instruction) register indirect (deferred) direct (absolute) displacement memory indirect (deferred) auto-increment/auto-decrement scaled PC-relative (for control instructions)
Ability to reduce instruction counts Add to the complexity of building a machine How to choose what to include? Based on the usage of various addressing modes Displacement size on data accesses Directly effect the instruction length Widely distributed Immediate size 50% to 75% of ALU operations have an immediate operand 75% to 85% of integer compare instructions have an immediate operand Size: 50% to 70% fit within 8 bits 75% to 80 % fit within 16bits
Operations supported by most ISAs Arithmetic and logical Data transfer Control branches (conditional) jumps (unconditional) procedure calls procedure returns System Floating point Decimal String Graphics Widely executed are simple operations
The destination is specified explicitly in the instruction except procedure returns A common way to specify the destination a displacement to be added to PC PC-relative often near the current PC results in a short displacement position independence displacement size signed value (75% are in the forward direction) short displace fields often suffice Indirect jump/procedure return indirect jump case/switch, dynamically shared library, virtual function dynamic addressing
Primary method condition code (cc) condition register compare and branch instruction A large amount of comparisons are == or!= tests Over 50% of integer comparisons are simple tests for equality with zero.
At a minimum, the return address must be saved Caller saving Save the register that it wants preserved after the call Callee saving Save the registers it wants to use What is better is dependent on the situation
Type designation Encode it in the opcode most often Annotate with tags that are interpreted by the hardware rare Size Character (1byte), half word (2 bytes) Word (4 bytes), double word (8 bytes) Single precision floating point (32 bits) Double precision floating point(64 bits) Extended floating point (80 points) 2 s complement for integer representation IEEE standard 754, 854 for floating-point operands Character strings Decimal format (packed decimal, unpacked decimal)
The word data type dominates Word>>half word>byte
How important is it to support byte accesses? Infrequent, but it requires an alignment network in hardware. The Alpha processor removed small-sized data transfers, but decided to include them later.
Operation is specified in a field, opcode How to encode the addressing modes with operands A large number of addressing modes and many operands Address specifier is needed for each operand One operand and one or two addressing modes Specified as part of the opcode Balancing competing forces As many registers and addressing modes as possible Impact on the instruction length, thus increasing the program size Lengthy encoding makes the implementation easy
Choices for encoding the instruction sets Variable --best for many addressing modes and operations Fixed -- best for a few addressing modes and operations Hybrid -- IBM 360,x86
Optimizations High-level optimization: on the source code Local optimizations: within a basic block Global optimizations: across branches transformations, optimizing loops Register allocation The same as graph coloring problem Graph coloring works well if there are at least 16 registers Machine dependent optimizations
Phase ordering problem Assume the ability of later steps to deal with certain problems For example, common sub-expression elimination The temporary must be allocated to a register
Type of variables Local variables The stack is used Addressed relative to the stack pointer Usually scalar variables Most effective when allocated in registers Global variables Global data area Arrays or other aggregate data structures Dynamic objects The heap is used Addressed with pointers and typically not scalars Register allocation is impossible, because they are accessed with pointers
Regularity The operations, data types, and addressing modes should be orthogonal Provide primitives, not solutions Approaches to make up the semantic gap have failed HLLCA (high-level language computer architecture)
Simplify trade-offs among alternatives Figuring out what instruction sequence will be best is tough Instruction counts and total code size are no longer good metrics With caches and pipelining, the trade-offs become complex Provide instructions that bind the quantities known at compile time as constants
Designing a high-level instruction set feature specifically oriented to supporting a high-level language structure. Semantic gap -- > semantic clash Too general for the most frequent case There is such a thing as a typical program. An architecture with flaws cannot be successful X86 : luck + investment You can design a flawless architecture. All architecture design involves trade-offs Technologies are likely to change over time
Innovating at the instruction set architecture to reduce code size without accounting for the compiler
Expecting to get good performance from a compiler for DSPs
MIPS Simple load-store instruction set Design for pipeline efficiency, including a fixed instruction set encoding Efficiency as a compiler target Registers 32 32-bit GPRs, 32 32-bit FPRs 32 single-precision registers 16 double-precision registers (F0,F2,,F30) R0 is always 0
Data types For integer operation:8-bit byte, 16-bit half word,32-bit word Bytes and half words are sign/zero extended for operations For floating point:32-bit single precision, 64-bit double precision Memory Byte addressable in Big Endian mode with a 32 bit address Addressing Modes Immediate Displacement Register indirect is accomplished by placing 0 in the disp.field Absolute addressing is accomplished by using R0 as the base register
Load Store instructions
Arithmetic/logical instructions LHI : loads the top half, while setting the low half to 0 32 bit immediate load: LHI + OR immediate R0 is always 0 loading a constant: add immediate with R0 (L1) register-register move: add where one source is R0 (MOV)
Control instructions Plain jump / jump and link Conditional branch : tests for zero or non-zero