Decompressing Snappy Compressed Files at the Speed of OpenCAPI. Speaker: Jian Fang TU Delft

Size: px

Start display at page:

Download "Decompressing Snappy Compressed Files at the Speed of OpenCAPI. Speaker: Jian Fang TU Delft"

Dorothy Parsons
5 years ago
Views:

1 Decompressing Snappy Compressed Files at the Speed of OpenCAPI Speaker: Jian Fang TU Delft 1

2 Current Project SHADE Scalable Heterogeneous Accelerated DatabasE Spark DB CPU POWER9 ARROW DNA Seq Sort Join Skyline HBM HBM NVMe OpenCAPI SNAP FPGA Decompression Fletcher OpenCAPI Memory NV Link Tables OpenCAPI SNAP FPGA Fletcher Actions GPU 2

3 Big Data Processing Database Analytics Data Movement 3

4 Moving Data from/to Storage Slow, Bottleneck 1st step in data processing and database analytics LZ4 RLE LZW Raw GZIP Data storage??? CPU Decompress-filter Solutions Netezza Compressed storage Data Filtered Decompressed FPGA Data Data Decompress-filter CPU 4

etc. Low compression ratio 500MB/s Per Core Core i7

5 Snappy compression Algorithm LZ77-based, byte-level, lossless In Hadoop ecosystem Support Parquet, ORC, etc. Low compression ratio 500MB/s Per Core Core i7 Goal OpenCAPI Speed Fast compression and decompression 5

6 Mechanism of Snappy compression Two kinds of token: Literal token and copy token Find match from previous data Example: ABCDEFGHABCD offset 8, length4 Use (8,4) to replace ABCD 7

7 Snappy Compression No Match! Input Generate a literal token Literal ABCDEFGHABCD Match! Find next match Generate a copy token copy : offset 8 length 4 ABCDEFGHABCD Output (Literal, ABCDEFGH ) (Copy, 8, 4) 8

8 Snappy Decompression (Literal, ABCDEFGH ) ABCDEFGH (Copy, 8, 4) Copy ABCDEFGHABCD 9

9 Snappy compression Algorithm An independent 64KB history for every 64KB data Token: copy or literal, copy lenth<=64b Token size: 2 Byte minimum, average < 4 Byte(assumption for design) Highly data dependent Benchmark Compression Ratio Average Token Size DNA DNA DNA TPC-H DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1

10 Design in FPGA An independent 64KB history for every 64KB data Token: copy or literal, copy lenth<=64b Token size: 2 Byte minimum, average < 4 Byte(assumption for design) Highly data dependent 16X 64KB history 4KB BRAM 11 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1

11 Design Challenge Deal with different types of tokens Handle various size of tokens Parallelize token translation Keep up with the interface bandwidth 13

12 Design Challenge Deal with different types of tokens Handle various size of tokens Parallelize token translation Keep up with the interface bandwidth 14

13 Design Challenge Deal with different types of tokens Handle various size of tokens Parallelize token translation Keep up with the interface bandwidth 15

14 Token Structure LZ77-based, byte-level In Hadoop ecosystem Support Parquet, ORC, etc. Two types of tokens: literal tokens and copy tokens Literal Token Literal Token: (literal, length, data) Copy Token : (copy, offset, length) 16 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master s thesis, Delft University of Technology, 2018.

15 Token Structure LZ77-based, byte-level In Hadoop ecosystem Support Parquet, ORC, etc. Two types of tokens: literal tokens and copy tokens Copy Token Tokens are in various length 17 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master s thesis, Delft University of Technology, 2018.

16 Prior Solutions: Decode all possiblities 18 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master s thesis, Delft University of Technology, 2018.

17 Design Challenge Deal with different types of tokens Handle various size of tokens Parallelize token translation Keep up with the interface bandwidth 19

18 Token Dependency Read Read No dependency Write Write Never write same bytes No dependency Write Read Forwarding Token1 Token 2 Token3 Token 4 Read Write Read Write Read Write Read Write 20

19 Token Dependency Address Conflict Write Write Same block, same line 16X BRAMs 21

20 Token Dependency Address Conflict Write Write Same block, different lines 16X BRAMs 22

21 Token Dependency Address Conflict Read Read 16X BRAMs 23

22 Design Challenge Deal with different types of tokens Handle various size of tokens Parallelize token translation Keep up with the interface bandwidth token size * token/cycle * frequency * instances = OpenCAPI BW Token size: 4B 2 tokens/cycle 250MHz --> 13 instances 24

23 IDEA from token-level to BRAM-level 16X 16X 25

24 Architecture Overview 26

25 Architecture Overview Two-level Parser 27

26 Architecture Overview Two-level Parser 28

27 Architecture Overview Two-level Parser Parallel BRAM Operations 29

28 Architecture Overview Two-level Parser Parallel BRAM Operations Relax Execution Model 30

29 Parallel BRAM Operations & Relax Execution Model 31

30 Results Implementation Result (on XCKU15P) Flip-Flop LUTs BRAMs Frequency Power 32K(3.32%) 46K(8.95%) 29(2.95%) 250MHz 2.9W Benchmark Alice s Adventures in Wonderland : 85KB (149KB) for compressed (uncompressed) file. Throughput FPGA(single decompressor) CPU(Core i7 with one thread) Input 3GB/s 300MB/s Output 5GB/s 500MB/s 32

31 Open Source Publish Paper: J. Fang, J. Chen, P. Hofstee, Z. Al-Ars, J. Hidders, A High-Bandwidth Snappy Decompressor in Reconfigurable Logic (CODES+ISSS 2018) FPGA-Snappy-Decompressor.git Contact Me: j.fang-1@tudelft.nl (Jian Fang) 33

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS. An FPGA-based Snappy Decompressor-Filter

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2018 MSc THESIS An FPGA-based Snappy Decompressor-Filter Yang Qiao Abstract New interfaces to interconnect CPUs and