Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company

Size: px

Start display at page:

Download "Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor. David Johnson Systems Technology Division Hewlett-Packard Company"

Hector Douglas
5 years ago
Views:

1 Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor David Johnson Systems Technology Division Hewlett-Packard Company

2 Presentation Overview PA-8500 Overview uction Fetch Capabilities Reorder Buffers ( The Queue ) Data Cache System Bus

3 PA-8500 D-Cache (0.5 MB) D-tag D-tag D-Cache (0.5 MB) Cache DP Int DP IRB ARB TLB Bus Cntl IF I-tag I-Cache (0.5 MB) FP Runway Bus I/O

4 PA-8500 Processor Core Inst. Cache uction Fetch Unit BHT, BTAC Dual 64-bit Integer ALUs Sort TLB System Bus Interface Runway bus Dual Shift/ Merge Units ALU Buffer 28 entries Memory Buffer 28 entries Dual Load/Store Adders Reorder Buffer 28 entries Data Cache Dual FP Multiply/ Accumulate Units Dual FP Divide/ SQRT Units Rename Registers Retire Architected Registers Rename Registers

5 Memory Latency 1000 Speed (MHz) CPU Latency Problems uction Fetches & Loads DRAM Techniques for Hiding Latency High hit-rate caches Prefetching Overlapping cache misses Year

6 uction Fetch Features uction Cache 0.5 MB on-chip cache 4-way set associative Pipelined 2-cycle access Provides 4 instructions per cycle to CPU core Supports 32-byte and 64-byte line sizes uction Prefetching

7 PA-8500 I-Cache Composition 4 uctions per cycle to Queue from a 0.5 MB cache uction Reorder Buffer ( Queue ) I-Fetch mux = TAGS I-Cache RAM I-Cache RAM I-Cache RAM I-Cache RAM

8 PA-8500 uction Prefetching 1. I-Miss from cache 2. I-Miss issued to Runway Bus 3. I-Prefetch issued to Runway Bus 4. I-Miss Return inserted into cache 5. I-Prefetch Return held in Prefetch Buffer 6. I-Miss from Cache causes Prefetch Buffer Hit 7. I-Miss moved from Prefetch Buffer to Cache 8. I-Prefetch issued to Runway Bus (next line) uction Reorder Buffer ( Queue ) uction Fetch Unit System Bus Interface Prefetch Buffer uction Cache Tags Runway Bus

9 Reorder Buffers System Bus Interface Runway Bus From I-Fetch uction Reorder Buffer ( Queue ) (56 entries) Load/Store Adder Load/Store Adder Reorder Buffer (28 entries) Data Cache (1MB) Cycle by cycle progression of a load instruction Insert Launch Cache Cache RR Retire

10 LOAD-MISS Overlapping LOAD-MISS Use LOAD-MISS The Problem time Use LOAD-MISS Use LOAD-MISS LOAD-MISS LOAD-MISS Use Use Use PA-8500 Solution

11 Reorder Buffer: High-Speed Custom Circuitry Cache port arbitration circuits miss-grantl grantl requests miss-grantl grantl requests Insert es matches matches 28 Entries Launch es Miss es from ALU to Cache 0-catcher to Runway 0-catcher

12 Data Prefetching LOAD-to-GR0 time LOAD-HIT The Problem Avoid the LOAD-MISS latency Solution Compiler inserts Prefetch instruction (LOAD to GR0) Independent instructions executed ( ) Data is resident in cache for LOAD (LOAD-HIT)

13 Data Cache Features 1.0 MB on-chip cache 4-way set associative 2-cycle pipelined access Two accesses per cycle Supports 32-byte and 64-byte line sizes Sophisticated Store Queue

14 Data Cache EVEN DATA RAM Store Queue Integer Data Path TAG RAM TAG RAM ODD DATA RAM = = Store Queue System Interface TLB

15 Single-Level vs. Multi-Level Cache Designs cycles Core Core 1 MB Data Cache 0.5 MB Cache L1 Data L1 Off-chip L2 32 KB 1 cycle 4 ~15 cycles System Bus System Bus

16 System Bus Interface Split-transaction bus with out-of-order returns Multiple transactions in flight simultaneously Priority given to latency-sensitive transactions Asynchronous Interface Turbo Mode

17 Turbo Mode Clock Data High-Speed Data Transfer between Memory and CPU

18 Mitigating Memory Latency Effects Large Caches Out-of-Order Queue Flexible System Interface Custom Circuit Design The PA-8500 Achieves Superb Performance!

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA