ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Material from Earlier in the Semester Throughput and latency of a circuit Processor pipeline and removing hazards Multiplier and multiply/divide instructions Basic MIPS instructions and knowledge Single-cycle datapath Be familiar with homeworks, projects, and midterm 1 and 2 ECE232: Final Exam Review 2

Example Machine Organization Workstation design target 25% of cost - processor 25% of cost - memory (minimum memory size) Rest - I/O devices, power supplies, box Processor (CPU) Computer Memory Devices Keyboard, Mouse Control Input Disk Datapath Output Display, Printer ECE232: Final Exam Review 3

PC Motherboard Closeup Courtesy: www.tigerdirect.com ECE232: Final Exam Review 4

Inside the Processor AMD Barcelona: 4 processor cores ECE232: Final Exam Review 5

System Layers Application software Written in high-level language System software Compiler: translates high level language code to machine code Operating System: service code Handling input/output Managing memory and storage Scheduling tasks & sharing resources Hardware Processor, memory, I/O controllers ECE232: Final Exam Review 6

Levels of Program Code High-level language Level of abstraction closer to problem domain Provides for productivity and portability Assembly language Textual representation of instructions Hardware representation Binary digits (bits) Encoded instructions and data ECE232: Final Exam Review 7

Datapath I/O A wire (or by extension, a bus) can be driven by only one tri-state at a time If InPass is active, AluPass must be inactive If AluPass is active, InPass must be inactive InPass OutPass LoadX X Y LoadY Function ALU AluPass ECE232: Final Exam Review 8

Program View of Memory Processor (CPU) Control Datapath Computer Memory Devices Input Output Memory viewed as a large, single -dimension array, with an address? 8 bits of data A memory address is an index into array The index points to a byte of memory - "Byte addressing" A 32-bit machine addresses memory by a 32-bit address Access bytes (8 bits), words (32 bits) or half-words 0 1 2 3 4 5 6... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data ECE232: Final Exam Review 9

MIPS Instruction Types Arithmetic & Logical - manipulate data in registers add $s1, $s2, $s3 $s1 = $s2 + $s3 or $s3, $s4, $s5 $s3 = $s4 OR $s5 Data Transfer - move register data to/from memory lw $s1, 100($s2) $s1 = Memory[$s2 + 100] sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Branch - alter program flow beq $s1, $s2, 25 if ($s1==$s2) PC = PC + 4 + 4*25 ECE232: Final Exam Review 10

Registers vs. Memory Registers in a register file are faster to access than memory Operating on memory data requires loads and stores More instructions to be executed Compiler must use registers for variables as much as possible Only spill to memory for less frequently used variables Register optimization is important! Registers are a fixed resources ECE232: Final Exam Review 11

MIPS Registers and Usage ECE232: Final Exam Review 12

MIPS Instructions All instructions exactly 32 bits wide Different formats for different purposes Similarities in formats ease implementation 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits op rs rt rd shamt funct R-Format 6 bits 5 bits 5 bits 16 bits op rs rt offset I-Format 6 bits 26 bits op address J-Format ECE232: Final Exam Review 13

Procedure Calling Steps required 1. The calling program places parameters in registers 2. The calling program transfers control to the procedure (callee) 3. The called procedure acquire storage that it needs from memory 4. The called procedure executes its operations 5. The called procedure places results in registers for the calling program to retrieve. 6. The called procedure reverts the appropriate MIPS registers to their original or correct state. 7. The called procedure returns control to the the next word in memory from which it was called. 8. The calling program proceeds with its calculations ECE232: Final Exam Review 14

What values are saved? $sp 0x7fff fffc Stack Dynamic Data pc 0x0040 0000 0 Static Data Text Reserved ECE232: Final Exam Review 15

IEEE Floating-Point Format single: 8 bits double: 11 bits S Exponent single: 23 bits double: 52 bits Fraction x = ( 1) S (1+ Fraction) 2 (Exponent Bias) S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 significand < 2.0 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203 ECE232: Final Exam Review 16

Floating-Point Multiplication Now consider a 4-digit binary example 1.000 2 2 1 1.110 2 2 2 (0.5 0.4375) 1. Add exponents Unbiased: 1 + 2 = 3 Biased: ( 1 + 127) + ( 2 + 127) = 3 + 254 127 = 3 + 127 2. Multiply significands 1.000 2 1.110 2 = 1.110 2 1.110 2 2 3 3. Normalize result & check for over/underflow 1.110 2 2 3 (no change) with no over/underflow 4. Round and renormalize if necessary 1.110 2 2 3 (no change) 5. Determine sign: +value value value 1.110 2 2 3 = 0.21875 ECE232: Final Exam Review 17

Floating Point Special Representations S E 127 F = ( 1) 1. f 2 1 1. < 2 f 1 E 254 Single Precision Double Precision Object represented Exponent Fraction Exponent Fraction 0 0 0 0 0 0 nonzero 0 nonzero ± denormalized number 1-254 Anything 1-2046 Anything ± floating point number 255 0 2047 0 ± infinity 255 nonzero 2047 nonzero NaN (not a number) ECE232: Final Exam Review 18

FP Adder Hardware Step 1 Step 2 Step 3 Step 4 ECE232: Final Exam Review 19

Instruction Execution Steps Instruction Fetch Decode, Inc PC and Read Registers 1. Read IM[PC] 2. Instruction Decode, PC = PC + 4, Register read ALU Operation, Branch address Data Memory operation 3. ALU operation, Branch address computation 4. LW/STORE in Data memory Write Back 5. Register Write ECE232: Final Exam Review 20

Datapath Step 1: Any Instruction 4 A d d PC Address 32-bit adder or ALU wired only for add Clock Instruction Instruction Memory (IMem) Once program is loaded, IMem is read-only ECE232: Final Exam Review 21

Single cycle data path op System clock affects primarily the Program Counter ECE232: Final Exam Review 22

MIPs Datapath Datapath contains 5 stages Instruction fetch (IF), Decode (ID), Execute (EX), Memory (Mem), Writeback (W) PC Instruction Memory Registers A L U Data Memory Stage 1 (IF) Stage 2 (ID) Stage 3 (EX) Stage 4 (Mem) Stage 5 (W) Can I pipeline the MIPs stages? ECE232: Final Exam Review 23

T a s k O r d e r Sequential Laundry A B C D 6 PM 7 8 9 10 11 12 1 2 AM 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? ECE232: Final Exam Review 24

Pipelining Lessons T a s k O r d e r A B C D 6 PM 7 8 9 Time 30 30 30 30 30 30 30 Pipelining doesn t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup ECE232: Final Exam Review 25

MIPS Pipelined Datapath State registers between pipeline stages to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Inst 5 Inst 4 Inst 3 Inst 2 Inst 1 Add PC 4 Instruction Memory Read Address IFetch/Dec Read Addr 1 Register Read Read Addr Data 2 1 File Write Addr Write Data Read Data 2 Dec/Exec Shift left 2 Add ALU Exec/Mem Address Write Data Data Memory Read Data Mem/WB Sign 16 Extend 32 System Clock ECE232: Final Exam Review 26

Pipeline Hazards Data hazards: an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2) Control hazards: the address of the next instruction to be executed depends on a previous instruction BEQ R1,R2,CONT SUB R6,R7,R8 CONT: ADD R3,R4,R5 Structural hazards: two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store ECE232: Final Exam Review 27

Forwarding with Load-use Data Hazards Time I n s t r. lw $1,4($2) sub $4,$1,$5 ALU IM Reg DM Reg ALU IM Reg DM Reg O r d e r and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg sub needs to stall Will still need one stall cycle even with forwarding ECE232: Final Exam Review 28

Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM IF/ID Control PC 4 Instruction Memory Read Address Add Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Data 2 Write Data 16 Sign 32 Extend Shift left 2 Add ALU ALU cntrl Branch Address Data Memory Write Data Read Data MEM/WB Forward Unit ECE232: Final Exam Review 29

Branch Instructions Cause Control Hazards I n s t r. O r d e r beq lw Inst 3 Inst 4 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg jr IF ID EX DM WB IF ID EX DM WB ECE232: Final Exam Review 30

One Way to Fix a Control Hazard I n s t r. beq stall ALU IM Reg DM Reg O r d e r stall stall lw Inst 3 Fix branch hazard by waiting introduce stalls ALU IM Reg DM Reg ALU IM Reg DM ECE232: Final Exam Review 31

Reducing branch penalty through HW design ECE232: Final Exam Review 32

Branch Prediction Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely) Dynamic prediction prediction per branch in program 1 bit predictor remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table Why? Multiple branches may share the same bit Invert the bit if prediction is wrong Branch PC BHT Predictor 0 Predictor 1 Predictor 127 ECE232: Final Exam Review 33