Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Size: px

Start display at page:

Download "Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching"

Laura Bruce
6 years ago
Views:

1 Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison 1

2 Please find the power point presentation in: 2

3 3

4 4

5 Communication vs. Computation ~200X Keckler Micro 2011 Improving cache utilization is critical for energy-efficiency!

energy Previous work limit compression effectiveness: - Limited number

6 Compressed Cache: Compress and Compact Blocks + Higher effective cache size + Small area overhead + Higher system performance + Lower system energy Previous work limit compression effectiveness: - Limited number of tags - High internal fragmentation - Energy expensive re-compaction

7 Decoupled Compressed Cache (DCC) Saving system energy by improving LLC utilization through cache compression. Non-Contiguous Decoupled Super-Blocks Sub-Blocks Previous work limit compression effectiveness: - Limited number of tags - High Internal Fragmentation - Energy expensive re-compaction 7

8 Decoupled Compressed Cache (DCC) Saving system energy by improving LLC utilization through cache compression. Outperform 2X LLC 1.08X LLC area 14% higher performance 12% lower energy 8

9 Outline Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions 9

10 Uncompressed Caching A fixed one-to-one tag/data mapping Tags Data 10

11 Compressed Caching Compact Add more compressed Compress tags to increase cache blocks, effective blocks. to make capacity. room. Tags Data 11

12 Compression (1) Compression: how to compress blocks? There are different compression algorithms. Not the focus of this work. But, which algorithm matters! 64 bytes 20 bytes Compressor 12

13 Compression Potentials We use C-PACK+Z for the rest of the talk! 1.5 Cycles to Decompress Compression Algorithm Compression Ratio = Original Size / Compressed Size High compression ratio potentially large normalized effective cache capacity. 13

14 Compaction (2) Compaction: how to store and find blocks? Critical to achieve the compression potentials. Fixed Sized Compressed Cache (FixedC) [Kim 02, WMPI, Yang Micro 02] This work focuses on compaction. Internal Fragmentation! Tags Data 14

15 Compaction (2) Compaction: how to store and find blocks? Variable Sized Compressed Cache (VSC) [Alameldeen, ISCA 2002] Tags Data Sub-block 15

16 Previous Compressed Caches Potential: B 10B (Limit 1) Limited Tag/Metadata Normalized Effective Capacity = LLC Number High Area of Overhead Valid Blocks by Adding / MAX 4X/more Number Tags of (Uncompressed) Blocks (Limit 2) Internal Fragmentation Low Cache Capacity Utilization 16

17 (Limit 3) Energy-Expensive Re-Compaction VSC requires energy-expensive re-compaction. B Update needs B 2 sub-blocks Tags Data 3X higher LLC dynamic energy! 17

18 Outline Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions 18

19 Decoupled Compressed Cache (1) Exploiting Spatial Locality Low Area Overhead (2) Decoupling tag/data mapping Eliminate energy expensive re-compaction Reduce internal fragmentation (3) Co-DCC: Dynamically co-compacting super-blocks Further reduce internal fragmentation 19

20 (1) Exploiting Spatial Locality Neighboring blocks co-reside in LLC. 89% 20

21 (1) Exploiting Spatial Locality DCC tracks LLC blocks at Super-Block granularity. Super Tags Tags Data 2X Tags 4X Tags Super-Block Tag Q state A state B state C state D Quad (Q): A, B, C, D Singleton (S): E Up to 4X blocks with low area overheads! 21

22 (2) Decoupling tag/data mapping DCC decouples mapping to eliminate re-compaction. Super Tags Flexible Update Allocation B Quad (Q): A, B, C, D Singleton (S): E 22

23 (2) Decoupling tag/data mapping Back pointers identify the owner block of each sub-block. Super Tags Back Pointers Data Tag ID Blk ID Quad (Q): A, B, C, D Singleton (S): E 23

24 (3) Co-compacting super-blocks Co-DCC dynamically co-compacts super-blocks. Reducing internal fragmentation A sub-block Quad (Q): A, B, C, D 24

25 Outline Motivation Compressed caching Our Proposals: Decoupled compressed cache Experimental Results Conclusions 25

26 Experimental Methodology Integrated DCC with AMD Bulldozer Cache. We model the timing and allocation constraints of sequential regions at LLC in detail. No need for an alignment network. Verilog implementation and synthesis of the tag match and sub-block selection logic. One additional cycle of latency due to sub-block selection. 26

27 Experimental Methodology Full-system simulation with a simulator based on GEMS. Wide range of applications with different level of cache sensitivities: Commercial workloads: apache, jbb, oltp, zeus Spec-OMP: ammp, applu, equake, mgrid, wupwise Parsec: blackscholes, canneal, freqmine Spec 2006 mixes (m1-m8): bzip2, libquantum-bzip2, libquantum, gcc, astarbwaves, cactus-mcf-milc-bwaves, gcc-omnetpp-mcf-bwaves-lbm-milc-cactusbzip, omnetpp-lbm Cores Eight OOO cores, 3.2 GHz L1I$/L1D$ Private, 32-KB, 8-way L2$ Private, 256-KB, 8-way L3$ Shared, 8-MB, 16-way, 8 banks Main Memory 4GB, 16 Banks, 800 MHz bus frequency DDR3 27

28 Effective LLC Capacity Normalized LLC Area 2 1 FixedC Baseline 2X Baseline VSC DCC Co-DCC Normalized Effective LLC Capacity Components FixedC/VSC-2X DCC Co-DCC Tag Array 6.3% 2.1% 11.3% Back Pointer Array 0 4.4% 5.4% (De-)Compressors 1.8% 1.8% 1.8% Total Area Overhead 8.1% 8.3% 18.5% 28

29 (Co-)DCC Performance (Co-)DCC boost system performance significantly. 29

30 (Co-)DCC Energy Consumption (Co-)DCC reduce system energy by reducing number of accesses to the main memory. 30

31 Summary Analyze the limits of compressed caching Limited number of tags Internal fragmentation Energy-expensive re-compaction Decoupled Compressed Cache Improving performance and energy of compressed caching Decoupled super-blocks Non-contiguous sub-blocks Co-DCC further reduces internal fragmentation Practical designs [details in the paper] 31

32 Backup (De-)Compression overhead DCC data array organization with AMD Bulldozer DCC Timing DCC Lookup Applications Co-DCC design LLC effective capacity LLC miss rate Memory dynamic energy LLC dynamic energy 32

33 (De-)Compression Overhead Parameters Compressor Decompressor Pipeline Depth 6 2 Latency (cycles) 16 9 Area (mm 2 ) Power Consumption (mw)

34 DCC Data Array Organization AMD Bulldozer SR3 A0.3 C3.0 B Phase Flop SR 2 A0.2 B1.1 A Phase Flop SR 1 A0.1 B1.0 B Phase Flop SR0 A0.0 C3.1 A Phase Flop A0: uncompressed; B1 and C2 are compressed to 2 sub-blocks N Set Addr Read Data 4 SR0 Addr 4 SR1 Addr 4 SR2 Addr 4 SR3 Addr 34

35 DCC Timing 35

36 DCC Lookup Super Tags 1. Access Super Tags and Back Pointers in parallel 2. Find the matched Back Pointers 3. Read corresponding sub-blocks and decompress 1 Q 0 S Back Pointers Read C Data Quad (Q): A, B, C, D Singleton (S): E 36

37 Applications Sensitive to Cache Latency Sensitive to Cache Capacity and Latency Cache Insensitive Sensitive to Cache Capacity Spec2006 (m1-m8) bzip2, libquantum-bzip2, libquantum, gcc, astar-bwaves, cactus-mcf-milc-bwaves, gcc-omnetpp-mcf-bwaves-lbm-milc-cactus-bzip, omnetpp-lbm 37

38 Co-DCC Design Sub-block 7 A2.1 A: <A2,A1,A0> Sub-block 6 Sub-block 5 Sub-blocks 4-2 A2.0 A1 A0.1 A1-Begin Sub-block 1 Sub-block 0 A0.0 A2.2 A0-Begin A-END Super- Block Tag Cstate3 Comp3 Begin3 Cstate2 Comp2 3b 1b 7b 3b1b 7b Begin2 Cstate1 Comp1 Begin1 Cstate0 Comp0 Begin0 3b 1b 7b 3b1b 7b END 7b Tag ID 1b Sharers 4b 38

39 LLC Effective Cache Capacity 4.0 Norm LLC Effective Capacity

40 LLC Miss Rate 1.00 Norm LLC Miss Rate

41 Memory Dynamic Energy 1.00 Norm Memory Dynamic Energy

42 LLC Dynamic Energy 6 Norm LLC Dynamic Energy

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti Computer Sciences Department University of Wisconsin-Madison somayeh@cs.wisc.edu David