CSC258: Computer Organization. Microarchitecture

Size: px

Start display at page:

Download "CSC258: Computer Organization. Microarchitecture"

Winifred Douglas
5 years ago
Views:

1 CSC258: Computer Organization Microarchitecture 1

2 Wrap-up: Function Conventions 2

3 Key Elements: Caller Ensure that critical registers like $ra have been saved. Save caller-save registers. Place arguments into $a0-$a3 and on the stack. Jump-and-link to the function.... On return, save the return value. Remove arguments from the stack. Restore caller-save registers. 3

4 Key Elements: Callee Allocate all of the space needed for the stack. If calling a function: store $ra and, if used, $fp. It s good practice to save arguments, too. Move the $fp. Store callee-save registers.... Place the return value in $v0. Restore callee-saves. Tear down the stack. 4

5 Parallelism and Pipelining 5

6 Quiz-like Questions for Discussion 1. Create a list of real-world examples of spatial parallelism. 2. Create a list of real-world examples of temporal parallelism. 3. Name the pipeline stages in the multicycle processor and describe the function performed by each. 4. Define data hazard and control hazard. 6

7 Quiz-like Questions for Discussion 1. Create a list of real-world examples of spatial parallelism. 2. Create a list of real-world examples of temporal parallelism. 3. Name the pipeline stages in the multicycle processor and describe the function performed by each. 4. Define data hazard and control hazard. 7

8 Parallelism 8

9 Parallelism Parallelism is a big idea. It might be the most important idea in this course. You ll revisit it again and again: in operating systems, databases, and when you re in industry. At its root, parallelism is the idea that you can derive benefit from completing multiple tasks simultaneously. 9

10 Performance When we discuss performance, we often consider: Latency: the length of time required to perform an operation. Throughput: the number of operations that can be completed within a unit of time. In the first half of this course, we focused on latency. (Think about timing analysis.) 10

11 Throughput Performing multiple tasks at once can often increase performance more than optimizing the performance of a single task. The graphics card in your computer works this way. Instead of only streamlining the process of computing one pixel... The card processes many pixels simultaneously. 11

12 Types of Parallelism These definitions are very hardware focused. You ll encounter other forms in other courses. Spatial: Completing the same task multiple times at the same time. Temporal (pipelined): Breaking a task into pieces, so that multiple different instructions can be in process at the same time. 12

13 Types of Parallelism (Credit: DDCA) 13

14 Parallelism is HARD You ll find that harnessing parallelism is very difficult. In software, we talk about dependencies. It s difficult to find ways to perform many operations in parallel because we think linearly. Instruction 1 must be completed before 2, before 3, before 4, before 5, before... In hardware, we ll discuss hazards. 14

15 Quiz-like Question (Q3.6) Describe the concept of pipelining (abstractly not just hardware) and explain why it is used. (Those of you in 209 may wish to think about how A3 connects to this question.)

16 Pipelined Microarchitectures 16

17 Review: Executing a Program First, load the program into memory. Set the program counter (PC) to the first instruction in memory and set the SP to the first empty space on the stack Let instruction fetch/decode do the work! The processor can control what instruction is executed next. When the process needs support from the operating system (OS), it will trap ( throw an exception ) Traps include print and halt 17

18 Single-Cycle MIPS Processor (Credit: DDCA) 18

19 Quiz-like Questions for Discussion 1. Create a list of real-world examples of spatial parallelism. 2. Create a list of real-world examples of temporal parallelism. 3. Name the pipeline stages in the multicycle processor and describe the function performed by each. 4. Define data hazard and control hazard. 19

20 Single-Cycle MIPS (Credit: DDCA) 20

21 Multi-Cycle MIPS (Credit: DDCA) 21

22 Execution Stages Fetch: Updating the PC and locating the instruction to execute. Decode: Translating the instruction and reading inputs from the register file. Execute / Address Computation: Using the ALU to compute an operation or calculate an address. (Memory Read or Write): Memory operations must access memory. Non-memory operations skip this. Register Writeback: The result is written to the register file. 22

23 Multicycle Control (Credit: DDCA) The performance of the controller is key. The correct control lines must be emitted at every cycle to complete the instruction. 23

24 Multicycle vs. Pipelined Designs The multicycle machine saved some hardware. Hardware was reused in multiple cycles. It also may have gained some time: Each cycle is shorter, and some instructions don t need all of the hardware. The pipelined design focuses on speed. Hardware is replicated when necessary to increase performance. 24

25 Why Pipeline? (Credit: DDCA) 25

26 Pipelined Processor (Credit: DDCA) 26

27 Quiz-like Questions for Discussion 1. Create a list of real-world examples of spatial parallelism. 2. Create a list of real-world examples of temporal parallelism. 3. Name the pipeline stages in the multicycle processor and describe the function performed by each. 4. Define data hazard and control hazard. 27

28 Hazards What happens if an instruction needs a value that has not been computed? This is a data hazard. What if an instruction is changing the PC? Shouldn t it complete before we fetch another instruction? This is a control hazard. 28

29 Quiz-like Question Name two techniques for resolving data and control hazards. What is the cost of implementing them?

30 Mitigating Hazards Data forwarding: Values are available before they are written back. After the execute stage, results are available, and they can be forwarded to the stage that needs them. Stalls: Sometimes, you just have to wait. A stall (or no-op) keeps a pipeline stage from doing anything. 30

31 Stalls and Performance Stalls throttle performance. In essence, a stall turns the pipelined processor back into a single-cycle processor! Sometimes, we can predict a result. If we re correct, then we get a performance win. If we re wrong, we drop the instruction that is using predicted values, and we re no worse off. Prediction is big business. It consumes a huge amount of the chip. 31

32 Quiz-like Question Where are the hazards in the following code? Identify and label them. START: lw $t0, 0($t1) addi $t0, $t0, 1 addi $t1, t1, 4 beq $s0, $t0, EXIT j START EXIT: sw $t0, -4($t1)

33 Exam-like Question How many cycles are required to execute the first five instructions of the following stream without forwarding and with forwarding? START: lw $t0, 0($t1) addi $t0, $t0, 1 addi $t1, t1, 4 beq $s0, $t0, EXIT j START EXIT: sw $t0, -4($t1)

34 Summary The microarchitecture is the hardware design of a datapath and controlpath that execute a defined set of instructions. That set of instructions is the instruction set architecture we ve studied that for the past two weeks. 34

35 Summary The key attribute of a microarchitecture is correctness. And to understand how these structures work, we ve had to rely on all of our understanding of digital design. However, performance is a key consideration. The multicycle design tried to save hardware resources. The pipelined design traded space for time: it added additional hardware to increase throughput. 35

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not