Wenisch Final Review. Fall 2007 Prof. Thomas Wenisch EECS 470. Slide 1

Size: px

Start display at page:

Download "Wenisch Final Review. Fall 2007 Prof. Thomas Wenisch EECS 470. Slide 1"

Egbert Erik Miller
6 years ago
Views:

1 Final Review Fall 2007 Prof. Thomas Wenisch Slide 1

2 Announcements Wenisch 2007 Exam is Monday, 12/ in this room I recommend bringing a scientific calculator Closed book/notes Slide 2

3 Stuff from the first half Parallelism, locality, amortization, memoization Amdahl s Law Iron Law Calculating speedup Instruction level parallelism Performance impactof in order vs OoO General HW structure of predictors (not branch predictors specifically) Slide 3

4 Memory Speculation What semantics does LSQ have to guarantee? How does non speculative load to store forwaring work? What hardware does it require? What are the implications of speculative loads? What is the purpose of dependence prediction? How does it work? What is the purpose of a store buffer? How can you break the dataflow ILP limit? Can you draw a simple HW diagram for value prediction? i Slide 4

5 Basic Caches Formula for effective access time for 1 level cache? What about 2 level cache? Associativity, block size, cache size Local vs. global hit/miss ratios Causes of cache misses (classification) Writeback vs. write through, allocate vs. no allocate Temporal vs. spatial locality Slide 5

6 Improving Cache Performance: Summary Miss rate large block size higher associativity victim caches hardware/software prefetching compiler optimizations Miss penalty give priority to read misses over writes/writebacks subblock placement early restart and critical word first non blocking caches multi level level caches Hit time (difficult?) small and simple caches avoiding translation during L1 indexing Slide 6

7 More Cache Issues Wenisch 2007 What is inclusion? How do you implement it? How to implement non blocking cache? Bandwidth enhancements: Multi porting Multiple cache copies Virtual multiporting Multi banking Line buffer Slide 7

8 Prefetching Software vs. hardware prefetching Instruction prefetching Stride based prefetching Stream buffers Run ahead prefetching Correlation based prefetching Slide 8

9 Virtual Memory Why do we have it? (protection, paging) Base/bound, Segmented VM, paged VM How is VM management different from cache management? Page table entries Page table designs (top down, bottom up, inverted) TLB designs VIVT, VIPT, PIPT caches Dealing with synonyms Slide 9

10 SRAM vs. DRAM Multiple memory banks Bank interleaving Main Memory Slide 10

11 Software ILP Wenisch 2007 List scheduling Speculativecode code motion (especially implications) Classic optimizations (know what these are, be able to give an example) Copy propagation, constant folding, strength reduction Common subexpressions, dead code elimination Induction variable elimination, Inlining Loop unrolling, loop invariant code motion, Profile driven optimization (esp. implications of using profiles) Trace scheduling & compensation code Slide 11

12 Binary Translation / Virtualization Why is it useful? Where isitused it today? (VMware, Java,Transmeta Crusoe) How does it work? Whatare are somedifficult corner cases to handle? Static vs. dynamic Slide 12

13 Power Why do power and energy matter for various markets? Dynamic, leakage, short circuit power Power ~ ½ C V 2 A f Performance ~ f ~ V Know how to compare voltage/freq scaling to other techniques Power vs. energy PDP, EDP, ED 2 P Slide 13

14 What are vectors? Data Level Parellelism Whathardware hardware isneeded for vectorcpus? How do vectors interact with caches? Strided/indexed scatter/gatter memory accesses Masks and chaining Slide 14

15 Thread Level Parellism Shared memory architectures NUMA vs UMA Busvs. point to pointto interconnects Advantages and limitations of busses MESI cache coherence protocol Purpose of a directory Slide 15

16 Multithreading Wenisch 2007 Advantages/Disadvantages: Superscalar Chip multiprocessor Coarse grain multithreading (switch on miss) Fine grain multithreading (round robinrobin every cycle) Simultaneous multithreading How do these interact with cache hierarchies? What kinds of programs work best on each of these? What changes to the microarchitecture are required? Slide 16

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

1. (11) True or False: (1) DRAM and Disk access times are rapidly converging. (1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea. (1) Amdahl s law