RECAP. B649 Parallel Architectures and Programming

Size: px

Start display at page:

Download "RECAP. B649 Parallel Architectures and Programming"

Ethelbert Bridges
5 years ago
Views:

1 RECAP B649 Parallel Architectures and Programming

2 RECAP 2

3 Recap ILP Exploiting ILP Dynamic scheduling Thread-level Parallelism Memory Hierarchy Other topics through student presentations Virtual Machines 3

4 ILP: Pipelining 4

5 Pipelining: Adding Latches 5

6 Pipelining: Adding Forwarding 6

7 Pipelining: Adding Branch Delay Slots 7

8 Extending the Basic Pipeline 8

9 Extending the Basic Pipeline 8

10 Extending the Basic Pipeline 8

11 Exploiting ILP Through Compiler Techniques Loop unrolling Making use of branch delayed slots Static branch prediction Loop fusion Unroll and jam... 9

12 Dynamic Branch Prediction Address bits bits to index BPB Branch Prediction Buffer 10

13 Two-bit Branch Predictor 11

14 General n-bit Correlating Branch Predictors Address bits bits to index BPB n bits Branch Prediction Buffer global shift register (m bits) Use Branch Target Buffers (BTBs) for caching branch targets 12

15 Dynamic Scheduling: Tomasulo s Approach 13

16 Tomasulo s Approach: Observations RAW hazards handled by waiting for operands WAR and WAW hazards handled by register renaming only WAR and WAW hazards between instructions currently in the pipeline are handled; is this a problem? larger number of hidden names reduces name dependences CDB implements forwarding 14

17 Fields in ROB 1. Instruction type 2. Destination 3. Value 4. Ready Tomasulo s Approach + Speculation 15

18 Observations on Speculation Speculation enables precise exception handling defer exception handling until instruction ready to commit Branches are critical to performance prediction accuracy latency of misprediction detection misprediction recovery time Must avoid hazards through memory WAR and WAW already taken care of (how?) for RAW don t allow load to proceed if an active ROB entry has Destination field matching with A field of load maintain program order for effective address computation (why?) 16

19 Multiple Issue Processor Types 17

20 Dyn. Scheduling+Multiple Issue+Speculation Design parameters two-way issue (two instruction issues per cycle) pipelined and separate integer and FP functional units dynamic scheduling, but not out-of-order issue speculative execution Task per issue: assign reservation station and update pipeline control tables (i.e., control signals) Two possible techniques do the task in half a clock cycle build wider logic to issue any pair of instructions together Modern processors use both (4 or more way superscalar) 18

21 Shared-Memory Multiprocessors 19

22 Distributed-Memory Multiprocessors 20

23 Other Ways to Categorize Parallel Programming threads vs producer-consumer client-sever vs p-to-p / master-slave vs symm. course vs fine grained SPMD vs MPMD tightly vs loosely coupled Parallel Programs recursive vs iterative shared mem vs msg passing data vs task parallel synch. vs asynch. 21

24 Write Invalidate Cache Coherence Protocol for Write-Back Caches 22

25 Distributed Memory+Directories 23

26 Directory-Based Cache Coherence 24

27 Other Topics x86 assembly programming VLIW / EPIC Vector processors Embedded systems Scientific applications GPUs and GPGPUs CUDA and OpenCL Interconnection networks Virtualization 25

28 WHAT S NEXT? 26

29 Future Continued importance of parallel programming challenge: how to program multiprocessors role of programming languages and compilers Convergence or specialization? standardization of general purpose architecture migration of special-purpose CPUs for general use 27

30 Landscape of Parallel Computing Research: A View from Berkeley 28

CMSC411 Fall 2013 Midterm 2 Solutions

CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has