Error Resilient LZ 77 Data Compression

Similar documents
An Adaptive-Parity Error-Resilient LZ'77 Compression Algorithm. Na napake odporen zgoščevalni algoritem LZ 77 s prilagodljivo pariteto

Lempel-Ziv-Welch (LZW) Compression Algorithm

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

Simple variant of coding with a variable number of symbols and fixlength codewords.

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Multimedia Systems. Part 20. Mahdi Vasighi

FPGA based Data Compression using Dictionary based LZW Algorithm

Image coding and compression

Comparative Study of Dictionary based Compression Algorithms on Text Data

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Lossless compression II

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Compressing Data. Konstantin Tretyakov

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods

EE-575 INFORMATION THEORY - SEM 092

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Study of LZ77 and LZ78 Data Compression Techniques

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Data Compression Techniques

V.2 Index Compression

DEFLATE COMPRESSION ALGORITHM

Compression; Error detection & correction

Lossless Compression Algorithms

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression

VC 12/13 T16 Video Compression

EE67I Multimedia Communication Systems Lecture 4

Introduction to Data Compression

Compression; Error detection & correction

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

An Asymmetric, Semi-adaptive Text Compression Algorithm

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

Distributed source coding

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

A Hybrid Approach to Text Compression

Modeling Delta Encoding of Compressed Files

Dictionary techniques

CS/COE 1501

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor

Data Compression 신찬수

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

VIDEO SIGNALS. Lossless coding

Fundamentals of Video Compression. Video Compression

5.4 SMALL POLYGON COMPRESSION FOR INTEGER COORDINATES

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3

Brotli Compression Algorithm outline of a specification

7: Image Compression

CS/COE 1501

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

Chapter 1. Digital Data Representation and Communication. Part 2

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

A Comparative Study Of Text Compression Algorithms

A Fast Block sorting Algorithm for lossless Data Compression

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY

COMPRESSION TECHNIQUES

Modeling Delta Encoding of Compressed Files

Network Working Group Request for Comments: January IP Payload Compression Using ITU-T V.44 Packet Method

Engineering Mathematics II Lecture 16 Compression

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

A Novel Image Compression Technique using Simple Arithmetic Addition

A Comprehensive Review of Data Compression Techniques

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING

Lecture 6 Review of Lossless Coding (II)

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Lossless compression II

Jimin Xiao, Tammam Tillo, Senior Member, IEEE, Yao Zhao, Senior Member, IEEE

Basic Compression Library

Digital Image Processing

Journal of Computer Engineering and Technology (IJCET), ISSN (Print), International Journal of Computer Engineering

Network Working Group Request for Comments: December 1998

Chapter 7 Lossless Compression Algorithms

Data compression with Huffman and LZW

EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS

Analysis of Parallelization Effects on Textual Data Compression

Distributed Video Coding

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount

Noise Reduction in Data Communication Using Compression Technique

Ch. 2: Compression Basics Multimedia Systems

University of Waterloo CS240 Spring 2018 Help Session Problems

Data and information. Image Codning and Compression. Image compression and decompression. Definitions. Images can contain three types of redundancy

Repetition 1st lecture

ELEC 691X/498X Broadcast Signal Transmission Winter 2018

So, what is data compression, and why do we need it?

Abdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013

Figure-2.1. Information system with encoder/decoders.

CSE 380 Computer Operating Systems

Transcription:

Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko

Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single error to the compressed stream corrupts O(n 2/3 log n) ) symbols, where n is the length of the stream LZRS 77 provides a way to add error correction bits without losing compression power or backward compatibility The idea on which LZRS 77 is based can be extended to other algorithms such as LZW

Example of Error Propagation In Sliding-Window Implementation of LZ 77: Original Text: THE THEFT OF THE IDE: THE IDENTITY Compressed Text: THE_ HE_(4,3) (4,3)FT_OF_ FT_OF_(13,4) (13,4)IDE: IDE:(10,8) (10,8)NTITY

Example of Error Propagation Compressed: THE_(4, (4,3)FT_OF_ FT_OF_(13,4) (13,4)IDE: IDE:(10,8) (10,8)NTITY Compressed With Error: THE_(4, (4,4)FT_OF_ FT_OF_(13,4) (13,4)IDE: IDE:(10,8) (10,8)NTITY Decompressed: THE THE FT OF HE TIDE: HE TIDENTITY

Motivation for LZS 77 LZS 77 is capable of storing error- correction bits in LZ 77 files without losing compression power backward-compatible with generic LZ 77 decoders

The Basic Idea of LZS 77 Sliding Window Current Position Z B C D B C D Z B C B X B C Z B C X The choice of the reference inside of the sliding window can be used to carry extra information M = Multiplicity (the number of occurrences of the substring in the sliding window)

Using Redundant Information Sliding Window Current Position Z B C D B C D Z B C B X B C Z B C X 10 01 00 11 In order to store value X, choose (X + 1) th reference in the sliding window (counting from right) log 2 M bits can be stored at this position

How Many Redundant Bits? Theoretically: Pr ( M = j) n p j q + jh q j p M n = multiplicity after n bits are compressed p = the probability of encountering 0 q = p 1 = the probability of 1 h = p log p q log q = Shannon s s entropy

How Many Redundant Bits? Theoretically: Pr ( M = j) n p j q + q jh j p (p = 0.5, q = 0.5) Observations: The probability is maximal for M n = 1 The probability for M n = 2 is 4 times smaller M n is well-concentrated around its mean

How Many Redundant Bits? In Practice: The increasing value of multiplicity M for increasing portions of paper2 (left) and news (right) from the Calgary corpus

Experimental Results File Original Size gzip gzips Redundant Bytes File Size Increase bib 111,261 39,473 39,511 1,721 (4.36%) 0.10% book1 768,771 333,776 336,256 14,524 (4.35%) 0.74% book2 610,856 228,321 228,242 10,361 (4.54%) -0.03% geo 102,400 69,478 71,168 4,101 (5.90%) 2.43% news 377,109 155,290 156,150 5,956 (3.84%) 0.55% obj1 21,504 10,584 10,783 353 (3.34%) 1.88% obj2 246,814 89,467 89,757 3,628 (4.06%) 0.32% paper1 53,161 20,110 20,204 937 (4.66%) 0.47% paper2 82,199 32,529 32,507 1,551 (4.77%) -0.07%

Not Enough? Look for long enough matches Increase the file size by O(log log n / log n), where n is the length of the original file This will be addressed in a future research

Compatibility with LZ 77 The difference between LZS 77 and plain LZ 77 is that in LZS 77, reference inside of the sliding window are not randomly chosen The generic LZ 77 decoder does not care about which substring in the sliding window is referenced

Motivation for LZRS 77 LZRS 77 is a compression algorithm based on LZS 77 which uses Reed- Solomon error correction codes It is capable of fixing a fixed number of errors in a block of data

Reed-Solomon Codes The data is divided into blocks: Maximum Size of Block: 2 s 1 bytes Data (2 s 1 2e bytes) Parity (2e bytes) s = the size of a symbol in bits e = the maximum number of tolerated errors

Reed-Solomon in LZRS 77 The data is divided into blocks: 255 bytes Data (255 2e bytes) Parity (2e bytes) s = 8 (the size of a symbol in SCII) e = the maximum number of tolerated errors

Compression lgorithm 1. Use plain LZ 77 to compress the data Compressed Data 2. Split the compressed data into blocks of size 255 2e bytes Block 1 Block 2 Block 3 Block N...

Compression lgorithm 3. Process blocks in reverse order: generate error correction codes for i th block and embed them to the previous (i( 1) th block Block 1 Block 2 Block 3 Block N... RS RS RS RS RS The RS codes are embedded by modifying the sliding-window references inside the blocks (RS codes for the first block are sent separately).

Decompression 1. fter receiving a block of data, use the error correction codes to check and recover (if possible) the block 2. Extract the data using LZ 77 3. Extract the error correction codes for the next block using LZS 77 (repeat for the next block)

Problems The entire set of buffers needs to be in the memory during compression Solution: divide the compressed file in parts The encoder cannot process the data as they come The RS codes of the first block need to be sent separately Solution: do not send them

Experimental Results The Probability of Decompressing Incorrectly v. Number of Errors e = 1, 100 blocks e = 2, 100 blocks Example: For 2 bytes of parity error correction codes per a 252 byte block and file size of 100 blocks, the probability of decompressing correctly is 90% for 20 uniformly distributed errors.

pplicability of the Idea The underlying idea is applicable to many compression schemes Multiplicity (if not present) can be generated with small loss of compression power

pplicability to LZW / LZ 78 Further research showed that it is possible to adapt the error-resilient scheme to LZW LZW is based on dynamic dictionary LZW is used by Unix Compress, WinZip, GIF, TIFF, PDF, V.42bis, and others

pplicability to LZW / LZ 78 Use a shorter match instead of the longest one to carry additional information Two ways to do this: The same as in LZS 77 Use the shorter match to end a block (the size of the block can carry the extra information)

Summary Multiplicity in LZ 77 can be exploited to add extra error correction bits Without virtually any loss of compression power While preserving backward-compatibility This idea can be extended to other compression schemes such as LZW