CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng.

Part 5: Processors Our goal: understand basics of processors and CPU understand the architecture of MARIE, a model computer a close look at the instruction set architecture know how to do assembly programming with MARIE architecture

Part 5: Processors Overview: Introduction to processors and CPU Introduction to the architecture of MARIE A close look at the instruction set architecture Assembly language and programming paradigm

A close look at the instruction set Instruction addressing mode Instruction-level pipeline

Instruction addressing mode

Instruction addressing mode Addressing modes specify where an operand is located. Purpose of different addressing modes: to be able to reference as many locations of memory as possible Immediate Direct Indirect Register Register Indirect Indexed Stack

Instruction addressing mode All computer architectures provide more than one addressing mode The CPU determines which addressing mode used in a particular instruction by: (1) different opcodes use different addressing modes (2) one or more bits in the instruction format can be used as a mode field Effective Address (EA) in Addressing mode is usually a main memory address or register address

Operand is part of instruction Operand = address field e.g. ADD 5 Add 5 to contents of accumulator 5 is operand No memory reference to fetch data Fast Instruction addressing mode: immediate addressing

Instruction addressing mode: direct addressing Address field contains address of operand Effective address (EA) = address field (A) e.g. ADD A Add contents of memory cell A to accumulator Look in memory at address A for operand Single memory reference to access data No additional calculations to work out effective address Limited address space

Instruction addressing mode: direct addressing diagram Instruction Opcode Address A Memory Operand

Memory cell pointed to by address field A contains the address of (pointer to) the operand EA = (A) Look in A, find address (A) and look there for operand e.g. ADD (A) Instruction addressing mode: indirect addressing Add contents of cell pointed to by contents of A to accumulator

Large address space May be nested, multilevel, cascaded e.g. EA = (((A))) Multiple memory accesses to find operand Hence slower Instruction addressing mode: indirect addressing

Instruction addressing mode: indirect addressing diagram Instruction Opcode Address A Memory Pointer to operand Operand

Operand is held in register named in address filed EA = R Limited number of registers Very small address field needed Shorter instructions Faster instruction fetch Instruction addressing mode: register addressing

No memory access Very fast execution Instruction addressing mode: register addressing Very limited address space Multiple registers helps performance Requires good assembly programming or compiler writing Similar with Direct addressing

Instruction addressing mode: register addressing diagram Instruction Opcode Register Address R Registers Operand

Similar with indirect addressing EA = (R) Instruction addressing mode: register indirect addressing Operand is in memory cell pointed to by contents of register R One fewer memory access than indirect addressing

Instruction addressing mode: register indirect addressing diagram Instruction Opcode Register Address R Memory Registers Pointer to Operand Operand

Instruction addressing mode: indexed addressing Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data. EA = A + (R) Address field hold two values A = base value R = register that holds offset

Instruction addressing mode: indexed addressing diagram Instruction Opcode Register R Address A Memory Registers Pointer to Operand + Operand

Operand is (implicitly) on top of stack e.g. Instruction addressing mode: stack addressing ADD Pop top two items from stack and add

Comparison of different addressing mode Mode Immediate Direct Indirect Register Register indirect Indexed Stack Algorithm Operand = A EA = A EA = (A) EA = R EA = (R) EA = A + (R) EA = top of stack Principal Advantage No memory reference Simple Large address space No memory reference Large address space Flexibility No memory reference Principal Disadvantage Limited operand magnitude Limited address space Multiple memory reference Limited address space Extra memory reference Complexity Limited applicability A: content of an address field. R: Content of an address field refers to register EA: Actual address

Addressing mode: an example Given the following memory values: Word 20 contains 40 Word 30 contains 50 Word 40 contains 60 Word 50 contains 70 What values do the following instructions load into the accumulator? Load immediate 20 Load direct 20 Load indirect 20 Load immediate 30 Load direct 30 Load indirect 30

Addressing mode: an example For the instruction shown, what value is loaded into the accumulator for each addressing mode?

Instruction-level pipeline

Instruction-level pipeline Some CPUs divide the fetch-decode-execute cycle into smaller steps. These smaller steps can often be executed in parallel to increase performance. Such parallel execution is called instruction-level pipelining. This term is sometimes abbreviated ILP in the literature.

Instruction-level pipeline Suppose a fetch-decode-execute cycle were broken into the following smaller steps: 1. Fetch instruction. 4. Fetch operands. 2. Decode opcode. 5. Execute instruction. 3. Calculate effective 6. Store result. address of operands. Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result.

Instruction-level pipeline For every clock cycle, one small step is carried out, and the stages are overlapped. S1. Fetch instruction. S2. Decode opcode. S3. Calculate effective address of operands. S4. Fetch operands. S5. Execute. S6. Store result. Effect: : N-stage N pipeline can operate on N instructions simultaneously Each stage takes one clock cycle. Each instruction takes one clock cycle once the pipeline is full.

Instruction-level pipeline The theoretical speedup offered by a pipeline can be determined as follows: Let t p be the time per stage. Each instruction represents a task, T, in the pipeline. The first task (instruction) requires k t p time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n - 1)t p. Thus, to complete n tasks using a k-stage pipeline requires: (k t p ) + (n - 1)t p = (k + n - 1)t p.

Instruction-level pipeline If we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find: If we take the limit as n approaches infinity, (k + n - 1) approaches n, which results in a theoretical speedup of:

Why understanding pipeline Pipeline is transparent to assembly language programmer Disadvantage: programmer who does not understand pipeline can produce inefficient code WHY? Reason: Hardware automatically stalls pipeline if items are not available, if the next instruction depends on the result of the previous instruction.

Example of instruction stalls Assume Need to perform addition and subtraction operations Operands and results in registers A through E Code is: Second instruction stalls to wait for operand C, The instruction K+1 needs the result of instruction K before it can continue. This causes instruction K+1 to wait until instruction k completes.

How to achieve maximum speed Program must be written to accommodate instruction pipeline To minimize stalls Avoid introducing unnecessary branches and subroutine call Avoid to invoke a co-processor, e.g. call an instruction that takes a long time like floating point arithmetic Avoid external storage

Example of avoiding stalls Stalls eliminated by rearranging (a) to (b)

Thank you for your attendance Any questions?