CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yanjing Li Department of Computer Science University of Chicago
Administrative Stuff! Lab1 is out! " Due next Thursday (10/6)! Lab2 " Out next Thursday 2
Lecture Outline! Introduction to ISA! Case Study: ARMv8 / LEGv8 3
Review: Basic Concepts! Basic concepts " What is a computer? " What is the von Neumann model? " What is ISA? " What is uarch? " Design point 4
ISA or uarch?! Instruction (e.g., add)! Number of general purpose registers! Number of ports to the register file! Number of cycles to execute the MUL instruction! Whether or not the machine employs pipelined instruction execution! Power/thermal management! Support for virtual memory 5
ISA! Instructions " Opcodes, Addressing Modes, Data Types " Instruction Types and Formats " Registers, Condition Codes! Memory organization " Address space, Addressability, Alignment " Virtual memory management! Call, Interrupt/Exception Handling! Access Control, Priority/Privilege! I/O: memory-mapped vs. instr.! Task/thread Management! Power and Thermal Management! Multi-threading support, Multiprocessor support 6
Many Different ISAs Over Decades! X86! ARM! MIPS! SPARC! IBM 360! What/why are the fundamental differences? 7
ISA Element: Instruction! Or machine code, consists of " opcode: what the instruction does (add, sub, ) " operands: who it is to do it to (register, memory, immediate)! Example 8
Data Types! Representation of information for which there are instructions that operate on the representation! ARMv8 " Integer (byte, half word, word, doubleword, quad word) " Floating point (half-, single-, double-precision) " Fixed point " Vector formats! Others (e.g., x86) " BCD, strings 9
Instruction Process Style! Specifies the number of operands an instruction operates on and how it does so! 0, 1, 2, 3 address machines " 0-address: stack machine (op, push, pop) " 1-address: accumulator machine (e.g., add mem) " 2-address: 2-operand machine (op D, S; one is both source and dest) " 3-address: 3-operand machine (op D, S1, S2; source and dest separate)! E.g., ARMv8 represents a 3-address machine 10
Instruction Classes! Operate instructions " Process data: arithmetic and logical operations " Fetch operands, compute result, store result " Implicit sequential control flow (e.g., PC <= PC + 4)! Data movement instructions " Move data between memory, registers, I/O devices " Implicit sequential control flow! Control flow instructions " Change the sequence of instructions that are executed 11
Instruction Addressing Modes! Specifies how to obtain an operand of an instruction " Register " Immediate " Memory (displacement, register indirect, indexed, absolute, memory indirect, autoincrement, autodecrement, )! Fewer or more addressing modes? Tradeoffs? 12
Instruction Addressing Modes for Memory! Specify how to obtain memory operands " Absolute LW Rt, 10000 use immediate value as address " Register Indirect: LW Rt, (r base ) use GPR[r base ] as address " Displaced or based: LW Rt, offset(r base ) use offset+gpr[r base ] as address " Indexed: LW Rt, (r base, r index ) use GPR[r base ]+GPR[r index ] as address " Memory Indirect LW Rt ((r base )) use value at M[ GPR[ r base ] ] as address " Auto inc/decrement LW Rt, (r base ) use GRP[r base ] as address, but inc. or dec. GPR[r base ] each Kme 13
Instruction Length! Fixed length: Length of all instructions the same + Easier to decode single instruction in hardware + Easier to decode multiple instructions concurrently (superscalar) -- Wasted bits in instructions (Why is this bad?) -- Harder-to-extend ISA (how to add new instructions?)! Variable length: Length of instructions different + Compact encoding (Why is this good?) + extensibility -- More logic to decode a single instruction -- Harder to decode multiple instructions concurrently! Tradeoffs " Code size (memory space, bandwidth, latency) vs. hardware complexity " ISA extensibility and expressiveness vs. hardware complexity " Performance/energy efficiency? Smaller code vs. ease of decode 14
Uniform/Non-uniform Decode of Inst! Uniform decode: Same bits in each instruction correspond to the same meaning " Opcode is always in the same location " Ditto operand specifiers, immediate values, " Many RISC ISAs: MIPS, SPARC + Easier decode, simpler hardware + Enables parallelism: generate target address before knowing the instruction is a branch -- Restricts instruction format (fewer instructions?) or wastes space! Non-uniform decode " E.g., opcode can be the 1st-7th byte in x86 + More compact and powerful instruction format -- More complex decode logic! Uniform decode usually means fixed length as well 15
x86 vs. MIPS Instruction Formats! x86! MIPS: 0 6-bit opcode 6-bit opcode 6-bit rs 5-bit rs 5-bit rt 5-bit rt 5-bit immediate 26-bit rd 5-bit immediate 16-bit shamt 5-bit funct 6-bit R-type I-type J-type 16
ISA Element: Registers! Fast storage! How many?! Size of each register?! General purpose vs. special purpose?! Why is having registers a good idea? " Because programs exhibit a characteristic called data locality " A recently produced/accessed value is likely to be used more than once (temporal locality)! Storing that value in a register eliminates the need to go to memory each time that value is needed! Complier: Register optimization is important! 17
ISA Element: Memory Organization! Address space: How many uniquely identifiable locations in memory! Addressability: How much data does each uniquely identifiable location store " Byte addressable: most ISAs! Aligned/unaligned access MSB byte-3 byte-2 byte-1 byte-0 byte-7 byte-6 byte-5 byte-4 LSB 18
Load/Store vs. Memory/Memory Architectures! Load/store architecture: operate instructions operate only on registers! E.g., MIPS, ARM and many RISC ISAs! Memory/memory architecture: operate instructions can operate on memory locations! E.g., x86 19
ISA Element: I/O! How to interface with I/O devices " Memory mapped I/O! A region of memory is mapped to I/O devices! I/O operations are loads and stores to those locations " Special I/O instructions! IN and OUT instructions in x86 deal with ports of the chip " Tradeoffs?! Which one is more general purpose? 20
Other ISA Elements! Privilege modes " User vs supervisor " Who can execute what instructions?! Exception and interrupt handling " What procedure is followed when something goes wrong with an instruction? " What procedure is followed when an external device requests the processor?! Virtual memory " Each program has the illusion of the entire memory space, which is greater than physical memory! Access protection 21
CISC vs. RISC! CISC, Complex instruction set computer # complex instructions " Initially motivated by not good enough code generation " Memory size/bandwidth considerations! RISC, Reduced instruction set computer # simple instructions " Goal: enable better compiler control and optimization " Motivated by! Simplifying the hardware # lower cost, higher frequency! Enabling the compiler to optimize the code better! Simple compiler, complex hardware vs. complex compiler, simple hardware 22
CISC vs. RISC! Usually,! RISC " Simple instructions " Fixed length " Uniform decode " Few addressing modes! CISC " Complex instructions " Variable length " Non-uniform decode " Many addressing modes 23
CISC vs. RISC! Example: x86! Each x86 instruction can be translated into a sequence of micro-instructions (uops) " Uops can be RISC-like " Stored in a read-only memory structure (UROM) " Why uops?! Simple processing engine to support complex instructions! Extensibility! Flexibility (can be patched to fix bugs)! Translation # unification of ISAs (ARM, x86, GPU)? 24
Aside: Ultimate RISC wikipedia 25
Review: Programmer Visible (Architectural) State M[0] M[1] M[2] M[3] M[4] Registers - given special names in the ISA (as opposed to addresses) - general vs. special purpose M[N-1] Memory array of storage locakons indexed by an address Program Counter memory address of the current instruckon InstrucKons (and programs) specify how to transform the values of programmer visible state 26
Programmer Invisible State! Microarchitectural state! Programmer cannot access this directly! E.g. cache state! E.g. pipeline registers 27
ARMv8/LEGv8 Case Study 28
The ARMv8 ISA! Commercialized by ARM Holdings (www.arm.com)! Large share of embedded core market " Applications in mobile, consumer electronics, network/ storage equipment, cameras, printers,! Typical of many modern ISAs! Reference (5740 pages) " https://developer.arm.com/docs/ddi0487/a/arm-architecturereference-manual-armv8-for-armv8-a-architecture-profile **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
ARMv8 Overview! RISC, Load/store architecture, both 32- and 64-bit! 3-address machine! 32-bit instructions! Simple datatypes " int, fp, fixed point/vector interpretation! Addressing modes: reg, imm, simple mem addressing " mem address from reg and instruction contents only! 32 GPRs, PC, SP, ELR, 32 SIMD/FP registers! Byte addressable! Memory space and memory alignment?! You will implement ARMv8 in C (Lab1)
LEGv8! A subset of ARMv8 " With some differences! Reference " Green card from textbook " Also available online " http://booksite.elsevier.com/9780128017333/arm_ref.php
Instruction Formats **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
Registers! 32 64-bit register file, and 1 64-bit PC **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
Memory Accesses! Memory is byte addressed " Each address identifies an 8-bit byte! Alignment " Does not require words (4 bytes, or 32 bits) to be aligned in memory, except for instructions and the stack **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
R-format Instructions opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits! Instruction fields " opcode: operation code " Rm: the second register source operand " shamt: shift amount " Rn: the first register source operand " Rd: the register destination **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
R-format Example opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits ADD X9,X20,X21 // add the values in X20 and X21, and put //the result in X9, or GPR[x9] = GPR[x20]+GPR[x21] 10001011000 two 10101 two 000000 two 10100 two 01001 two 1000 1011 0001 0101 0000 0010 1000 1001 two = 8B150289 16 **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
shamt in R-format instructions opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits! shamt: how many positions to shift! Shift left logical (LSL) " R[Rd] <- R[Rn] << shamt //Shift left and fill with 0 bits " LSL by i bits: multiplies by 2 i! Shift right logical (LSR) " R[Rd] <- R[Rn] >> shamt //Shift right and fill with 0 bits " LSR by i bits: divides by 2 i (unsigned only)! Note, R-format instructions in ARMv8 support shift operations in the second operand before applying the operation specified in opcode **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
C to Assembly 101! C code: f = (g + h) - (i + j); " f,, j in X19, X20,, X23! Compiled into assembly: ADD X9, X20, X21 ADD X10, X22, X23 SUB X19, X9, X10 **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
I-format Instructions opcode immediate Rn Rd 10 bits 12 bits 5 bits 5 bits! Immediate instructions " Rn: source register " Rd: destination register " Immediate field: constant data; zero-extended! Example: ADDI X22, X22, #4 " What does the machine code look like for ADDI? **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
D-format Instructions opcode addoffset op2 Rn Rt 11 bits 9 bits 2 bits 5 bits 5 bits! Load/store instructions " Rn: base register " address: constant offset from contents of base register (+/- 32 doublewords) " op2: expands the opcode field " Rt: destination (load) or source (store) register number! Example: LDUR X9,[X22,#64] " LDUR opcode:11111000010 2; op2:0 " X9 (Rt field) " X22 (Rn field) **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
C to Assembly 201! C code: A[12] = h + A[8]; " h in X21, base address of A in X22! Compiled code: " Index 8 requires offset of 64 (byte-addressed memory) LDUR X9,[X22,#64] ADD X9,X21,X9 STUR X9,[X22,#96] **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]
B Format Instructions opcode BR_address 6 bits 26 bits! Example: B L1 " branch unconditionally to instruction labeled L1;! B opcode: 0A0 16-0BF 16 " In ARMv8, it is 000101 2! Effect: if taken, PC = PC + BranchAddr **Based on original figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]