Computer Systems Architecture

Size: px

Start display at page:

Download "Computer Systems Architecture"

Gordon Terry
5 years ago
Views:

1 Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, /04/2010 Lecture 12 CSCE 4610/5610 1

2 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2

3 Increasing Fetch Bandwidth In multiple-issue processors, predicting branches is not good enough To increase the fetch rate, we need to predict next PC in the IF stage itself Two techniques Branch target buffer - predicts conditional branches Return address predictor - predicts indirect jumps 03/04/2010 Lecture 12 CSCE 4610/5610 3

4 Branch Target Buffer PC of branch sent to BTB When match is found, Predicted PC is returned If branch predicted taken, instruction fetch continues at Predicted PC 03/04/2010 Lecture 12 CSCE 4610/5610 4

5 BTB: Operation 03/04/2010 Lecture 12 CSCE 4610/5610 5

6 Return Address Predictor Small buffer of return addresses acts as a stack Caches most recent return addresses Call: Push a return address on stack Return: Pop an address off stack & predict as new PC 03/04/2010 Lecture 12 CSCE 4610/5610 6

7 Integrated Instruction Fetch Integrated branch prediction: branch predictor is part of instruction fetch unit and is constantly predicting branches Instruction prefetch: Instruction fetch units prefetch to deliver multiple instructions per clock, integrating it with branch prediction Instruction memory access and buffering: Fetching multiple instructions per cycle May require accessing multiple cache blocks (prefetch to hide cost of crossing cache blocks) Provides buffering, acting as on-demand unit to provide instructions to issue stage as needed and in quantity needed 03/04/2010 Lecture 12 CSCE 4610/5610 7

8 Speculation: Register Renaming Alternative to ROB is a larger physical set of registers combined with register renaming Extended registers replace function of both ROB and reservation stations Instruction issue maps names of architectural registers to physical register numbers in extended register set On issue, allocates a new unused register for the destination (which avoids WAW and WAR hazards) Speculation recovery easy because a physical register holding an instruction destination does not become the architectural register until the instruction commits Most Out-of-Order processors today use extended registers with renaming 03/04/2010 Lecture 12 CSCE 4610/5610 8

9 Speculation: Value Prediction Attempts to predict value produced by instruction E.g., Loads a value that changes infrequently Value prediction is useful only if it significantly increases ILP Focus of research has been on loads; so-so results, no processor uses value prediction Related topic is address aliasing prediction RAW for load and store or WAW for 2 stores Address alias prediction is both more stable and simpler since need not actually predict the address values, only whether such values conflict Has been used by a few processors 03/04/2010 Lecture 12 CSCE 4610/5610 9

10 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 10 minutes 03/04/2010 Lecture 12 CSCE 4610/

11 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 10 minutes 03/04/2010 Lecture 12 CSCE 4610/

12 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 09 minutes 03/04/2010 Lecture 12 CSCE 4610/

13 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 08 minutes 03/04/2010 Lecture 12 CSCE 4610/

14 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 07 minutes 03/04/2010 Lecture 12 CSCE 4610/

15 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 06 minutes 03/04/2010 Lecture 12 CSCE 4610/

16 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 05 minutes 03/04/2010 Lecture 12 CSCE 4610/

17 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 04 minutes 03/04/2010 Lecture 12 CSCE 4610/

18 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 03 minutes 03/04/2010 Lecture 12 CSCE 4610/

19 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 02 minutes 03/04/2010 Lecture 12 CSCE 4610/

20 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 01 minutes 03/04/2010 Lecture 12 CSCE 4610/

21 Quiz Time Finish the Quiz handed out to you Write your Name and ID Work on the quiz individually You may refer books, notes, and use calculators Write your answers briefly (maximum one sentence) and clearly You have: 00 minutes 03/04/2010 Lecture 12 CSCE 4610/

22 Memory Hierarchy Processor-Memory Performance Gap 03/04/2010 Lecture 12 CSCE 4610/

23 Memory Hierarchy How to address this performance gap? Put smaller, faster memories between CPU and DRAM 03/04/2010 Lecture 12 CSCE 4610/

24 Memory Hierarchy Memory hierarchy uses the principle of locality 03/04/2010 Lecture 12 CSCE 4610/

25 Memory Hierarchy Memory hierarchy uses the principle of locality Two different types of locality 03/04/2010 Lecture 12 CSCE 4610/

26 Memory Hierarchy Memory hierarchy uses the principle of locality Two different types of locality Temporal Locality: If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) 03/04/2010 Lecture 12 CSCE 4610/

27 Memory Hierarchy Memory hierarchy uses the principle of locality Two different types of locality Temporal Locality: If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality: If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) 03/04/2010 Lecture 12 CSCE 4610/

28 Memory Hierarchy: Terminology Block: The unit of operation 03/04/2010 Lecture 12 CSCE 4610/

29 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level 03/04/2010 Lecture 12 CSCE 4610/

30 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level Hit Rate: the fraction of memory access found in the memory level 03/04/2010 Lecture 12 CSCE 4610/

31 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level Hit Rate: the fraction of memory access found in the memory level Hit Time: Time to access the memory level which consists of RAM access time + time to determine hit/miss 03/04/2010 Lecture 12 CSCE 4610/

32 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level Hit Rate: the fraction of memory access found in the memory level Hit Time: Time to access the memory level which consists of RAM access time + time to determine hit/miss Miss: Data (block) is not present in the memory level 03/04/2010 Lecture 12 CSCE 4610/

33 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level Hit Rate: the fraction of memory access found in the memory level Hit Time: Time to access the memory level which consists of RAM access time + time to determine hit/miss Miss: Data (block) is not present in the memory level Miss Rate: 1 - Hit Rate 03/04/2010 Lecture 12 CSCE 4610/

34 Memory Hierarchy: Terminology Block: The unit of operation Hit: Data (block) is present in the memory level Hit Rate: the fraction of memory access found in the memory level Hit Time: Time to access the memory level which consists of RAM access time + time to determine hit/miss Miss: Data (block) is not present in the memory level Miss Rate: 1 - Hit Rate Miss Penalty: Time to get the block from the lower level, place it in the current level and deliver it to the processor 03/04/2010 Lecture 12 CSCE 4610/

35 Memory Hierarchy: Performance Metrics Miss rate: 03/04/2010 Lecture 12 CSCE 4610/

36 Memory Hierarchy: Performance Metrics Miss rate: NO 03/04/2010 Lecture 12 CSCE 4610/

37 Memory Hierarchy: Performance Metrics Miss rate: NO Average Memory Access Time: Hit Time + Miss rate Miss Penalty 03/04/2010 Lecture 12 CSCE 4610/

38 Memory Hierarchy: Performance Metrics Miss rate: NO Average Memory Access Time: Hit Time + Miss rate Miss Penalty CPU Execution time CPU Execution Time = (CPU Clock Cycles + Memory Stall Cycles) Clock Cycle Time Memory Stall Cycles = Number of Misses Miss Penalty = IC Misses Miss Penalty Instruction Mem Access = IC Miss Rate Miss Penalty Instruction 03/04/2010 Lecture 12 CSCE 4610/

39 Memory Hierarchy: Design Questions 1 Where can a block be placed in a memory level? Block Placement 03/04/2010 Lecture 12 CSCE 4610/

40 Memory Hierarchy: Design Questions 1 Where can a block be placed in a memory level? Block Placement 2 How is a block found? Block Identification 03/04/2010 Lecture 12 CSCE 4610/

41 Memory Hierarchy: Design Questions 1 Where can a block be placed in a memory level? Block Placement 2 How is a block found? Block Identification 3 Which block should be replaced on a miss? Block Replacement 03/04/2010 Lecture 12 CSCE 4610/

42 Memory Hierarchy: Design Questions 1 Where can a block be placed in a memory level? Block Placement 2 How is a block found? Block Identification 3 Which block should be replaced on a miss? Block Replacement 4 What happens on a write? Write Strategy 03/04/2010 Lecture 12 CSCE 4610/

it is fully associative If a block can be placed in restricted set of

43 Associativity: Block Placement If each block has only one place it can appear then it is direct mapped If a block can be placed anywhere the it is fully associative If a block can be placed in restricted set of places then it is set associative 03/04/2010 Lecture 12 CSCE 4610/

44 Addressing: Block Identification Tag each block Check on access to see whether the tag matches with the address 03/04/2010 Lecture 12 CSCE 4610/

45 Replacement Direct mapped: only one place for a block, hence simple Set and Fully associative: Random LRU Not MRU FIFO 03/04/2010 Lecture 12 CSCE 4610/

46 Write Strategy Write Policies write-through write-back Policy written to both the block at the written only to the current level current level and lower level Read misses No effect May cause writes to lower level Repeated writes Repeated writes to lower level Only one write to lower level write-allocate Write-miss Policies no write-allocate Policy allocate a block do not allocate a block, pass it to the next level 03/04/2010 Lecture 12 CSCE 4610/

Getting CPI under 1: Outline

CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more