CSE 49/59 Computer Architecture ISAs and MIPS Last Time Computer Architecture >> ISAs and RTL Comp. Arch. shaped by technology and applications Computer Architecture brings a quantitative approach to the table 5 quantitative principles of design The current performance trend sho that Latency lags behind bandwidth Steve Ko Computer Sciences and gineering University at Bualo CSE 49/59, Spring 211 CSE 49/59, Spring 211 2 Instruction Set Architecture (ISA) The contract beten software and hardware Typically described by giving all the programmervisible state (registers + memory) plus the semantics of the ructions that operate on that state IBM 36 was first line of machines to separate ISA from implementation (aka. microarchitecture) Many implementations possible for a given ISA E.g., today you can buy AMD or Intel processors that run the x86-64 ISA. E.g.2: many cellphones use the ARM ISA with implementations from many dierent companies including TI, Qualcomm, Samsung, Marvell, etc. E.g.3., the Soviets build code-compatible clones of the IBM36, as did Amdhal after he left IBM. ISA to Microarchitecture Mapping ISA often designed with particular microarchitectural style in mind, e.g., CISC microcoded RISC hardwired, pipelined VLIW fixed-latency in-order parallel pipelines JVM software interpretation But can be implemented with any microarchitectural style Intel Nehalem: hardwired pipelined CISC (x86) machine (with some microcode support) Intel could implement a dynamically scheduled outof-order VLIW Itanium (IA-64) processor ARM Jaelle: A hardware JVM processor CSE 49/59, Spring 211 3 CSE 49/59, Spring 211 4 Datapath vs Datapath signals ler Microcoded Microarchitecture (CISC) busy? ero? opcode µcontroller (ROM) holds fixed microcode ructions Datapath Points Data r Datapath: Storage, FU, interconnect suicient to perform the desired functions Inputs are Points Outputs are signals ler: State machine to orchestrate operation on the data path Based on desired function and signals CSE 49/59, Spring 211 5 holds user program written in macrocode ructions (e.g., MIPS, x86, etc.) (RAM) enmem MemWrt CSE 49/59, Spring 211 6 C 1
CISC 7 cycles Inst 1 Inst 2 5 cycles 1 cycles Time Variable cycles per ruction Variable ess modes reg-reg, reg-mem, mem-mem, etc. Convenient for programmers Support many ructions Diicult to predict completion time No good for pipelining Inst 3 Hardwired is pure Combinational Logic (RISC) op code ero? combinational logic BSrc Sel MemWrite WBSrc RegDst Src CSE 49/59, Spring 211 7 CSE 49/59, Spring 211 8 A "Typical" RISC ISA 32-bit fixed format ruction (3 formats) 32 32-bit GPR (R contains ero, DP take pair) 3-ess, reg-reg arithmetic ruction Single ess mode for load/store: base + displacement no indirection Simple branch conditions Delayed branch Designed for use by compilers & pipelining see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM Por, CDC 66, CDC 76, Cray-1, Cray-2, Cray-3 Example: MIPS Register-Register 31 26 25 21 2 16 15 11 1 6 5 Rs1 Rs2 Register-ediate 31 26 25 21 2 16 15 Rs1 Rd immediate 31 26 25 target Rd x Branch 31 26 25 21 2 16 15 Rs1 Rs2/x immediate Jump / Call CSE 49/59, Spring 211 9 CSE 49/59, Spring 211 1 Hardware Elements Combinational circuits Mux, Decoder,, A A 1 A n-1. Sel lg(n) O Mux Synchronous state elements Flipflop, Register, Register file, SRAM, DRAM D Q D Q Edge-triggered: Data is sampled at the rising edge CSE 49/59, Spring 211 A lg(n) Decoder. O O 1 O n-1 A B Select -, Sub, - And, Or, Xor, Not, - GT, LT, EQ, Zero, Result Comp? Register Files ReadSel1 ReadSel2 WriteSel WriteData Reads are combinational D Q register D 1 Q 1 D 2 Q 2 Clock WE Register rd2 file 2R+1W wd D n-1 Q n-1 ReadData1 ReadData2 CSE 49/59, Spring 211 12 C 2
A Simple Model ress WriteData Writeable Clock MAGIC RAM ReadData Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) a Write is performed at the rising clock edge if it is enabled the write ess and data must be stable at the clock edge CSE 49/59 Administrivia Please check the b page: http://www.cse.bualo.edu/~stevko/ courses/cse49/spring11! Don t forget Recitations start from this ek. Please purchase a BASYS2 board (1K) as soon as possible. Projects should be done individually. Please read the syllabus bpage. I have no idea how fast/slow I m going. Please stop me if too fast! CSE 49/59, Spring 211 13 CSE 49/59, Spring 211 14 Implementing MIPS: Single-cycle per ruction datapath & control logic CSE 49/59, Spring 211 15 The MIPS ISA Processor State 32 32-bit, R always contains a 32 single precision FPRs, may also be vied as 16 double precision FPRs FP status register, used for FP compares & exceptions, the program counter some other special registers Data types 8-bit byte, 16-bit half word 32-bit word for integers 32-bit word for single precision floating point 64-bit word for double precision floating point Load/Store style ruction set data essing modes- immediate & indexed branch essing modes- relative & register indirect Byte essable memory- big endian mode All ructions are 32 bits CSE 49/59, Spring 211 16 Example: MIPS Register-Register 31 26 25 21 2 16 15 11 1 6 5 Rs1 Rs2 Register-ediate 31 26 25 21 2 16 15 Rs1 Rd immediate 31 26 25 target Rd x Branch 31 26 25 21 2 16 15 Rs1 Rs2/x immediate Jump / Call Instruction Execution Execution of an ruction involves 1. ruction fetch 2. decode and register fetch 3. operation 4. memory operation (optional) 5. write back and the computation of the ess of the next ruction CSE 49/59, Spring 211 17 CSE 49/59, Spring 211 18 C 3
Datapath: Reg-Reg Instructions Datapath: Reg- Instructions x4 x4 <25:21> <2:16> <15:11> <25:21> <2:16> <15:> <5:> <31:26> Timing? rs rt rd func rd (rs) func (rt) 31 26 25 21 2 16 15 11 5 CSE 49/59, Spring 211 19 6 5 5 16 31 26 25 212 16 15 CSE 49/59, Spring 211 2 Conflicts in Merging Datapath Datapath for Instructions x4 <25:21> <2:16> <15:11> <15:> Introduce muxes x4 <25:21> <2:16> <15:11> <15:> <31:26> <5:> <31:26>, <5:> rs rt rd func rd (rs) func (rt) CSE 49/59, Spring 211 21 RegDst Sel BSrc rt / rd Reg / rs rt rd func rd (rs) func (rt) CSE 49/59, Spring 211 22 Datapath for Instructions Should program and data memory be separate? Harvard style: separate (Aiken and Mark 1 influence) - read-only program memory - read/write data memory - Note: Somehow there must be a way to load the program memory Princeton style: the same (von Neumann s influence) - single read/write memory for program and data Load/Store Instructions:Harvard Datapath x4 base disp MemWrite rdata Data wdata WBSrc / Mem - Note: A Load or Store ruction requires accessing the memory more than once during its execution CSE 49/59, Spring 211 23 RegDst Sel 6 5 5 16 essing mode opcode rs rt displacement (rs) + displacement 31 26 25 21 2 16 15 rs is the base register rt is the destination of a Load or the source for a Store CSE 49/59, Spring 211 24 BSrc C 4
Acknowledgements These slides heavily contain material developed and copyright by Krste Asanovic (MIT/UCB) David Patterson (UCB) And also by: Arvind (MIT) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowic (UCB) MIT material derived from course 6.823 UCB material derived from course CS252 CSE 49/59, Spring 211 25 C 5