Memory-Link Compression Schemes: A Value Locality Perspective. Georgy Ushakov Institutt for datateknikk og informasjonsteknologi

Size: px

Start display at page:

Download "Memory-Link Compression Schemes: A Value Locality Perspective. Georgy Ushakov Institutt for datateknikk og informasjonsteknologi"

Blaze Lynch
5 years ago
Views:

1 1 Memory-Link Compression Schemes: A Value Locality Perspective Georgy Ushakov Institutt for datateknikk og informasjonsteknologi 30. September 2009

2 2 Why? Processor s speed increases Transistor count is doubling, while pin count growing by less than 10% per generation Increasing on-chip caches has negative effects Increasing cache hierarchy has negative effects Memory-bound application are limited by off-chip bandwidth Multi-core processors have higher requirements for off-chip bandwidth

3 3 How? Compress data before it is transferred on the link Decompress data before the block is written in cache or memory Possibility of storing compressed values in the memory (not evaluated)

4 4 What? Pros: Freeing up bandwidth Reducing transfer time Reducing miss penalty Cons: Compression/Decompression latencies

5 5 Value Locality Small Value Locality Cluster Value Locality Isolated Value Locality

6 6 Applications Integer Gzip, vpr, gcc, perlbmk Media Epic, ghostscript, gsme, mpeg2d Commercial OLTP, TPC-W, SPECjbb, SPECweb

7 7 Integer

8 8 Media

9 9 Commercial

10 10 Value Distribution for gzip

11 11 Theoretical benefit 3,000 samples, 1,000,000 data transfers per sample Using Huffman coding, average 8 bits per word

12 12 Significance-width compression (SWC) Encodes the sign extension bits Compresses small integers Simple, fast and stateless, can encode the whole block in parallel Applying SWC to large numbers can result significant overhead, however it can be solved by partitioning

13 13 5 bit basic SWC

14 14 SWC and cache size

15 15 Delta encoding For each cluster find cluster value For each value in cluster send only delta (difference) between cluster value and value Cluster value cache at each side of the link When the difference is larger than a given threshold, least recently used cluster value is replaced Caches must remain consistent Larger caches increases hit rate, but also increases index bits

16 16 Delta encoding

17 17 The Citron Scheme Same infrastructure as in delta encoding 16 most significant bits are matched to a value in value cache If found, the index of cache value is transferred with the least significant bits If not found, least recently used cache value is replaced and caches are updated on both sides of the link

18 18 The Citron Scheme

19 19 The Frequent Value Encoding (FVE) Scheme Similar to the Citron Scheme Caches full 32 bit value If hit, the index is transferred

20 20 FVE

21 21 Harvesting Small and Clustered Value Locality Delta encoding is inefficient because offsets (deltas) are typically small values This can be improved by running offsets through SWC encoder

22 22 Delta encoding+swc

23 23 Harvesting Small and Isolated Value Locality The Citron and FVE are inefficient in value cache miss case If the missed value is relatively small, the SWC can compress it We do not store values that can be represented by less than 16 bits

24 24 SWC+FVE

25 25 SWC+Citron

26 26 Conclusion Bandwidth requirement rises with 15-30% per core Small Value Locality is common in all types of applications (ca 40% under 8 bits) The rest are big numbers SWC frees about 35% bandwidth Delta encoding, FVE and Citron frees about 60% bandwidth By combining different schemes it is possible to free up 70-75% bandwidth

Memory-Link Compression Schemes: A Value Locality Perspective

Memory-Link Compression Schemes: A Value Locality Perspective Martin Thuresson, Lawrence Spracklen and Per Stenström IEEE Presented by Jean Niklas L orange and Caroline Sæhle, for TDT01, Norwegian University